【24h】

Improve Parallelism of Task Execution to Optimize Utilization of MapReduce Cluster Resources

机译:改善任务执行的并行性,以优化MapReduce集群资源的利用

获取原文

摘要

MapReduce, as a programming model, has become an important solution to large-scale data-intensive processing. It has been widely used in various fields such as Web search, machine learning and e-commerce. Hadoop, as an open-source implementation of MapReduce, is widely used for offline massive data job. It consists of MapReduce and HDFS. In the study of Hadoop, we found data parallel in Hadoop is coarse grained, and it cannot take full advantage of multi-core system. Eventually, this would lower utilization and efficiency of the whole cluster. To improve Hadoop into a fine grained data-parallel frame, we propose a strategy that scales the parallelism of task execution in map/reduce task. We implement our strategy as a new feature for Hadoop. And our experiments show that strategy can not only optimize utilization of MapReduce cluster resources, but also speedup job completion time up to 3x.
机译:MapReduce作为一种编程模型,已成为大规模数据密集型处理的重要解决方案。它已广泛用于Web搜索,机器学习和电子商务等各个领域。 Hadoop作为MapReduce的开源实现,已广泛用于离线海量数据工作。它由MapReduce和HDFS组成。在Hadoop的研究中,我们发现Hadoop中的并行数据是粗粒度的,无法充分利用多核系统的优势。最终,这将降低整个集群的利用率和效率。为了将Hadoop改进为细粒度的数据并行框架,我们提出了一种在map / reduce任务中扩展任务执行并行性的策略。我们将策略实施为Hadoop的一项新功能。我们的实验表明,该策略不仅可以优化MapReduce群集资源的利用率,而且还可以将作业完成时间缩短3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号