2024-03-29T10:29:51Z
https://meral.edu.mm/oai
oai:meral.edu.mm:recid/4343
2022-03-24T23:12:21Z
1582963302567:1597824273898
user-ucsy
Performance-Aware Data Placement Policy for Hadoop Distributed File System
Soe, Nang Kham
Yee, Tin Tin
Htoon, Ei Chaw
Apache Hadoop is an open-source softwareframework for distributed storage and distributedprocessing of very large data sets on computerclusters built from commodity hardware. The HadoopDistributed File System (HDFS) is the underlying filesystem of a Hadoop cluster. The default HDFS dataplacement strategy works well in homogeneouscluster. But it performs poorly in heterogeneousclusters because of the heterogeneity of the nodescapabilities. It may cause overload in somecomputing nodes and reduce Hadoop performance.Therefore, Hadoop Distributed File System (HDFS)has to rely on load balancing utility to balance datadistribution. As a result, data can be placed evenlyacross the Hadoop cluster. But it may cause theoverhead of transferring unprocessed data from slownodes to fast nodes because each node has differentcomputing capacity in heterogeneous Hadoopcluster. In order to solve these problems, adata/replica placement policy based on storageutilization and computing capacity of each data nodein heterogeneous Hadoop Cluster is proposed. Theproposed policy tends to reduce the overload of somecomputing nodes as well as reduce overhead of datatransmission between different computing nodes.
2018-02-22
http://hdl.handle.net/20.500.12678/0000004343
https://meral.edu.mm/records/4343