首页> 外文期刊>Journal of Grid Computing >MapReduce Workload Modeling with Statistical Approach
【24h】

MapReduce Workload Modeling with Statistical Approach

机译:使用统计方法的MapReduce工作量建模

获取原文
获取原文并翻译 | 示例

摘要

Large-scale data-intensive cloud computing with the MapReduce framework is becoming pervasive for the core business of many academic, government, and industrial organizations. Hadoop, a state-of-the-art open source project, is by far the most successful realization of MapReduce framework. While MapReduce is easy- to-use, efficient and reliable for data-intensive computations, the excessive configuration parameters in Hadoop impose unexpected challenges on running various workloads with a Hadoop cluster effectively. Consequently, developers who have less experience with the Hadoop configuration system may devote a significant effort to write an application with poor performance, either because they have no idea how these configurations would influence the performance, or because they are not even aware that these configurations exist. There is a pressing need for comprehensive analysis and performance modeling to ease MapReduce application development and guide performance optimization under different Hadoop configurations. In this paper, we propose a statistical analysis approach to identify the relationships among workload characteristics, Hadoop configurations and workload performance. We apply principal component analysis and cluster analysis to 45 different metrics, which derive relationships between workload characteristics and corresponding performance under different Hadoop configurations. Regression models are also constructed that attempt to predict the performance of various workloads under different Hadoop configurations. Several non-intuitive relationships between workload characteristics and performance are revealed through our analysis and the experimental results demonstrate that our regression models accurately predict the performance of MapReduce workloads under different Hadoop configurations.
机译:带有MapReduce框架的大规模数据密集型云计算正逐渐普及到许多学术,政府和工业组织的核心业务中。 Hadoop是最新的开源项目,是迄今为止最成功的MapReduce框架实现。尽管MapReduce易于使用,高效且可靠,可用于数据密集型计算,但Hadoop中过多的配置参数给有效地利用Hadoop集群运行各种工作负载带来了意想不到的挑战。因此,对Hadoop配置系统缺乏经验的开发人员可能会花大量精力编写性能低下的应用程序,或者是因为他们不知道这些配置将如何影响性能,或者是因为他们甚至不知道这些配置是否存在。 。迫切需要全面的分析和性能建模,以简化MapReduce应用程序开发并指导不同Hadoop配置下的性能优化。在本文中,我们提出了一种统计分析方法来识别工作负载特征,Hadoop配置和工作负载性能之间的关系。我们将主成分分析和群集分析应用于45个不同的指标,这些指标可得出工作负载特征与不同Hadoop配置下的相应性能之间的关系。还构建了回归模型,以尝试预测不同Hadoop配置下各种工作负载的性能。通过我们的分析揭示了工作负载特征与性能之间的几种非直观关系,实验结果表明,我们的回归模型可以准确预测不同Hadoop配置下MapReduce工作负载的性能。

著录项

  • 来源
    《Journal of Grid Computing》 |2012年第2期|p.279-310|共32页
  • 作者单位

    Sino-German Joint Software Institute, The State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China;

    Sino-German Joint Software Institute, The State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China;

    Sino-German Joint Software Institute, The State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China;

    Sino-German Joint Software Institute, The State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Cloud computing; Data intensive computing; MapReduce; Workload characterization; Statistical analysis; Performance prediction;

    机译:云计算;数据密集型计算;MapReduce;工作量表征;统计分析;性能预测;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号