首页> 外文会议>International conference on algorithms and architectures for parallel processing >HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters
【24h】

HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters

机译:HPSO:基于预取的调度以改善MapReduce群集的数据局部性

获取原文

摘要

Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on what and when to prefetch. To effectively use prefetching, we have built HPSO (High Performance Scheduling Optimizer), a prefetching service based task scheduler to improve data locality for MapReduce jobs. The basic idea is to predict the most appropriate nodes to which future map tasks should be assigned and then preload the input data to memory without any delaying on launching new tasks. To this end, we have implemented HPSO in Hadoop-1.1.2. The experiment results have shown that the method can reduce the map tasks causing remote data delay, and improves the performance of Hadoop clusters.
机译:由于集群资源竞争和任务调度策略,一些映射任务被分配给没有输入数据的节点,这会导致大量的数据访问延迟。数据局部性正成为影响MapReduce群集性能的最关键因素之一。由于MapReduce群集中的计算机具有较大的存储容量(通常未充分利用),因此在内存中预取输入数据是提高数据局部性的有效方法。但是,它仍然对集群设计人员在何时何地预取方面提出了严峻的挑战。为了有效地使用预取,我们构建了HPSO(高性能调度优化器),这是一种基于预取服务的任务调度程序,可以改善MapReduce作业的数据局部性。基本思想是预测应将将来的地图任务分配给的最合适的节点,然后将输入数据预加载到内存中,而不会在启动新任务时出现任何延迟。为此,我们在Hadoop-1.1.2中实现了HPSO。实验结果表明,该方法可以减少地图任务导致的远程数据延迟,并提高Hadoop集群的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号