HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters

机译：HPSO：基于预取的调度以改善MapReduce群集的数据局部性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on what and when to prefetch. To effectively use prefetching, we have built HPSO (High Performance Scheduling Optimizer), a prefetching service based task scheduler to improve data locality for MapReduce jobs. The basic idea is to predict the most appropriate nodes to which future map tasks should be assigned and then preload the input data to memory without any delaying on launching new tasks. To this end, we have implemented HPSO in Hadoop-1.1.2. The experiment results have shown that the method can reduce the map tasks causing remote data delay, and improves the performance of Hadoop clusters.

机译：由于集群资源竞争和任务调度策略，一些映射任务被分配给没有输入数据的节点，这会导致大量的数据访问延迟。数据局部性正成为影响MapReduce群集性能的最关键因素之一。由于MapReduce群集中的计算机具有较大的存储容量（通常未充分利用），因此在内存中预取输入数据是提高数据局部性的有效方法。但是，它仍然对集群设计人员在何时何地预取方面提出了严峻的挑战。为了有效地使用预取，我们构建了HPSO（高性能调度优化器），这是一种基于预取服务的任务调度程序，可以改善MapReduce作业的数据局部性。基本思想是预测应将将来的地图任务分配给的最合适的节点，然后将输入数据预加载到内存中，而不会在启动新任务时出现任何延迟。为此，我们在Hadoop-1.1.2中实现了HPSO。实验结果表明，该方法可以减少地图任务导致的远程数据延迟，并提高Hadoop集群的性能。

著录项

来源
《International conference on algorithms and architectures for parallel processing》|2014年|82-95|共14页
会议地点
作者
Mingming Sun; Hang Zhuang; Xuehai Zhou; Kun Lu; Changlong Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data locality; MapReduce clusters; prefetching; task scheduler;

机译：数据局部性; MapReduce集群;预取任务调度器;

相似文献

外文文献
中文文献
专利

1. Scheduling algorithm based on prefetching in MapReduce clusters [J] . Sun Mingming, Zhuang Hang, Li Changlong, Applied Soft Computing . 2016,第Null期

机译：MapReduce集群中基于预取的调度算法
2. A Predictive Map Task Scheduler for Optimizing Data Locality in MapReduce Clusters [J] . Mohamed Merabet, Sidi mohamed Benslimane, Mahmoud Barhamgi, International journal of grid and high performance computing . 2018,第4期

机译：用于在MapReduce集群中优化数据局部性的预测性Map Task Scheduler
3. Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy [J] . Benifa J. V. Bibal, Dejey Wireless personal communications: An Internaional Journal . 2017,第3期

机译：基于高效地点和副本意识到（ELRAS）策略的异构集群Mapreduce的性能改进
4. HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters [C] . Mingming Sun, Hang Zhuang, Xuehai Zhou, ICA3PP 2014 . 2014

机译：HPSO：基于预取的计划，以改进MapReduce集群的数据局部
5. Scheduling in MapReduce Clusters [D] . He, Chen. 2018

机译：MapReduce群集中的调度
6. MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data [O] . Jingjing Wang, Chen Lin 2015

机译：基于MapReduce的个性化本地敏感哈希用于大规模数据上的相似联接
7. Improving MapReduce Performance by Data Prefetching in Heterogeneous or Shared Environments [O] . Tao Gu, Chuang Zuo, Qun Liao, 2015

机译：通过异构或共享环境中的数据预取来提高mapReduce性能

HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters

摘要

著录项

相似文献

相关主题

期刊订阅