【24h】

Performance Optimization for Short MapReduce Job Execution in Hadoop

机译:Hadoop中短mapreduce作业执行的性能优化

获取原文

摘要

Hadoop MapReduce is a widely used parallel computing framework for solving data-intensive problems. To be able to process large-scale datasets, the fundamental design of the standard Hadoop places more emphasis on high-throughput of data than on job execution performance. This causes performance limitation when we use Hadoop MapReduce to execute short jobs that requires quick responses. In order to speed up the execution of short jobs, this paper proposes optimization methods to improve the execution performance of MapReduce jobs. We made three major optimizations: first, we reduce the time cost during the initialization and termination stages of a job by optimizing its setup and cleanup tasks, second, we replace the pull-model task assignment mechanism with a push-model, third, we replace the heartbeat-based communication mechanism with an instant message communication mechanism for event notifications between the Job Tracker and Task Trackers. Experimental results show that the job execution performance of our improved version of Hadoop is about 23% faster on average than the standard Hadoop for our test application.
机译:Hadoop MapReduce是一种广泛使用的并行计算框架,可解决数据密集型问题。为了能够处理大规模的数据集,标准Hadoop的基本设计更加强调数据的高吞吐量而不是作业执行性能。当我们使用Hadoop mapReduce执行需要快速响应的短作业时,这会导致性能限制。为了加快执行短作业的执行,提出了改进MapReduce作业的执行性能的优化方法。我们制作了三个主要优化:首先,我们通过优化其设置和清理任务,通过优化其设置和清理任务,减少了作业的初始化和终止阶段的时间成本,我们用推送模型,第三,我们取代了拉动模型任务分配机制用即时消息通信机制替换基于心跳的通信机制,用于作业跟踪器和任务跟踪器之间的事件通知。实验结果表明,我们改进的Hadoop版本的工作表现比我们的测试应用程序的标准Hadoop更快的23%。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号