首页> 外文会议>IEEE International Conference on Cluster Computing and Workshops >HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment
【24h】

HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment

机译:HPMR:共享MapReduce计算环境中的预取和预洗机

获取原文
获取外文期刊封面目录资料

摘要

MapReduce is a programming model that supports distributed and parallel processing for large-scale data-intensive applications such as machine learning, data mining, and scientific simulation. Hadoop is an open-source implementation of the MapReduce programming model. Hadoop is used by many companies including Yahoo!, Amazon, and Facebook to perform various data mining on large-scale data sets such as user search logs and visit logs. In these cases, it is very common to share the same computing resources by multiple users due to practical considerations about cost, system utilization, and manageability. However, Hadoop assumes that all cluster nodes are dedicated to a single user, failing to guarantee high performance in the shared MapReduce computation environment. In this paper, we propose two optimization schemes, prefetching and pre-shuffling, which improve the overall performance under the shared environment while retaining compatibility with the native Hadoop. The proposed schemes are implemented in the native Hadoop-0.18.3 as a plug-in component called HPMR (High Performance MapReduce Engine). Our evaluation on the Yahoo!Grid platform with three different workloads and seven types of test sets from Yahoo! shows that HPMR reduces the execution time by up to 73%.
机译:MapReduce是一种编程模型,支持用于大规模数据密集型应用的分布式和并行处理,如机器学习,数据挖掘和科学仿真。 Hadoop是MapReduce编程模型的开源实现。 Hadoop被许多公司使用,包括雅虎,亚马逊,亚马逊,以及Facebook在大规模数据集上执行各种数据挖掘,例如用户搜索日志和访问日志。在这些情况下,由于有关成本,系统利用率和可管理性的实际考虑,这是由多个用户共享相同的计算资源。但是,Hadoop假设所有群集节点都专用于单个用户,无法保证共享MapReduce计算环境中的高性能。在本文中,我们提出了两种优化方案,预取和预洗机,这提高了共享环境下的整体性能,同时保留与本土Hadoop的兼容性。所提出的方案是在原生Hadoop-0.18.3中实现的,作为名为HPMR(高性能MapReduce引擎)的插件组件。我们对雅虎的评估!网格平台具有三种不同的工作负载和雅虎的七种类型的测试套装表明HPMR将执行时间减少高达73%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号