2026-04-17T08:38:39Z https://meral.edu.mm/oai

oai:meral.edu.mm:recid/6387 2021-12-13T04:43:15Z 1582963342780:1605779935331 user-uit

Resource-based Data Placement Strategy for Hadoop Distributed File System Nang Kham Soe Tin Tin Yee Ei Chaw Htoon Big-Data is a term for data sets that are so large or complex that traditional data processing tools are inadequate to process or manage them. Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. The default Hadoop data placement strategy works well in homogeneous cluster. But it performs poorly in heterogeneous clusters because of the heterogeneity (in terms of processing, memory, throughput, I/O, etc.) of the nodes capabilities. It may cause load imbalance and reduce Hadoop performance. Therefore, Hadoop Distributed File System (HDFS) has to rely on load balancing utility to balance data distribution. The utility consumes the cost of extra system resources and running time. As a result, data can be placed evenly across the Hadoop cluster. But it may cause the overhead of transferring unprocessed data from slow nodes to fast nodes because each node has different computing capacity in heterogeneous Hadoop cluster. In order to solve these problems, a data/replica placement algorithm based on storage utilization and computing capacity of each data node in heterogeneous Hadoop Cluster is proposed. The proposed policy can balance the workload as well as reduce overhead of data transmission between different computing nodes. 2017-11-02 http://hdl.handle.net/20.500.12678/0000006387 https://meral.edu.mm/records/6387