MapReduce is a popular program ming model for processing large-scale dataset in a distributed environment and is a funda mental component of current cloud comput ing and big data applications. In this paper a heartbeat mechanism for MapReduce Task Scheduler using Dynamic Calibration(HMTS DC) is proposed to address the unbalanced node computation capacity problem in a het erogeneous MapReduce environment. HMTS DC uses two mechanisms to dynamically adapt and balance tasks assigned to each com pute node: 1) using heartbeat to dynamically estimate the capacity of the compute nodes and 2) using data locality of replicated data blocks to reduce data transfer between nodes With the first mechanism, based on the heart beats received during the early state of the job the task scheduler can dynamically estimate the computational capacity of each node. Us ing the second mechanism, unprocessed Task local to each compute node are reassigned and reserved to allow nodes with greater capacitie to reserve more local tasks than their weake counterparts. Experimental results show tha HMTS-DC performs better than Hadoop and Dynamic Data Placement Strategy(DDP) in a dynamic environment. Furthermore, an en hanced HMTS-DC(EHMTS-DC) is proposed by incorporating historical data. In contrasto the "slow start" property of HMTS-DC, EHMTS-DC relies on the historical computation capacity of the slave machines. The experimental results show that EHMTS-DC outperforms HMTS-DC in a dynamic environment.
展开▼