首页> 外文期刊>Journal of supercomputing >Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach
【24h】

Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach

机译:基于基准方法设计分布式异构平台的MapReduce性能模型

获取原文
获取原文并翻译 | 示例
       

摘要

MapReduce framework is an effective method for big data parallel processing. Enhancing the performance of MapReduce clusters, along with reducing their job execution time, is a fundamental challenge to this approach. In fact, one is faced with two challenges here: how to maximize the execution overlap between jobs and how to create an optimum job scheduling. Accordingly, one of the most critical challenges to achieving these goals is developing a precise model to estimate the job execution time due to the large number and high volume of the submitted jobs, limited consumable resources, and the need for proper Hadoop configuration. This paper presents a model based on MapReduce phases for predicting the execution time of jobs in a heterogeneous cluster. Moreover, a novel heuristic method is designed, which significantly reduces the makespan of the jobs. In this method, first by providing the job profiling tool, we obtain the execution details of the MapReduce phases through log analysis. Then, using machine learning methods and statistical analysis, we propose a relevant model to predict runtime. Finally, another tool called job submission and monitoring tool is used for calculating makespan. Different experiments were conducted on the benchmarks under identical conditions for all jobs. The results show that the average makespan speedup for the proposed method was higher than an unoptimized case.
机译:MapReduce框架是大数据并行处理的有效方法。增强MapReduce集群的性能,以及减少工作执行时间,对这种方法来说是一个根本的挑战。事实上,人们面临着两个挑战:如何最大化作业之间的执行重叠以及如何创建最佳作业调度。因此,实现这些目标的最关键挑战之一正在开发一个精确的模型,以估计由于提交的作业的数量大,消耗资源有限,有限的耗材资源以及适当的Hadoop配置而估算了作业执行时间。本文介绍了一种基于MapReduce阶段的模型,用于预测异构群集中作业的执行时间。此外,设计了一种新的启发式方法,这显着减少了工作的Mapspan。在此方法中,首先通过提供作业分析工具,我们通过日志分析获取MapReduce阶段的执行细节。然后,使用机器学习方法和统计分析,我们提出了一个相关模型来预测运行时。最后,另一个名为作业提交和监控工具的工具用于计算Makespan。在所有工作条件下的基准下进行不同的实验。结果表明,所提出的方法的平均Mapspan加速度高于未优化的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号