MapReduce Workload Modeling with Statistical Approach

Hailong Yang; Zhongzhi Luan; Wenjun Li; Depei Qian

首页> 外文期刊>Journal of Grid Computing >MapReduce Workload Modeling with Statistical Approach

【24h】

MapReduce Workload Modeling with Statistical Approach

机译：使用统计方法的MapReduce工作量建模

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Large-scale data-intensive cloud computing with the MapReduce framework is becoming pervasive for the core business of many academic, government, and industrial organizations. Hadoop, a state-of-the-art open source project, is by far the most successful realization of MapReduce framework. While MapReduce is easy- to-use, efficient and reliable for data-intensive computations, the excessive configuration parameters in Hadoop impose unexpected challenges on running various workloads with a Hadoop cluster effectively. Consequently, developers who have less experience with the Hadoop configuration system may devote a significant effort to write an application with poor performance, either because they have no idea how these configurations would influence the performance, or because they are not even aware that these configurations exist. There is a pressing need for comprehensive analysis and performance modeling to ease MapReduce application development and guide performance optimization under different Hadoop configurations. In this paper, we propose a statistical analysis approach to identify the relationships among workload characteristics, Hadoop configurations and workload performance. We apply principal component analysis and cluster analysis to 45 different metrics, which derive relationships between workload characteristics and corresponding performance under different Hadoop configurations. Regression models are also constructed that attempt to predict the performance of various workloads under different Hadoop configurations. Several non-intuitive relationships between workload characteristics and performance are revealed through our analysis and the experimental results demonstrate that our regression models accurately predict the performance of MapReduce workloads under different Hadoop configurations.

机译：带有MapReduce框架的大规模数据密集型云计算正逐渐普及到许多学术，政府和工业组织的核心业务中。 Hadoop是最新的开源项目，是迄今为止最成功的MapReduce框架实现。尽管MapReduce易于使用，高效且可靠，可用于数据密集型计算，但Hadoop中过多的配置参数给有效地利用Hadoop集群运行各种工作负载带来了意想不到的挑战。因此，对Hadoop配置系统缺乏经验的开发人员可能会花大量精力编写性能低下的应用程序，或者是因为他们不知道这些配置将如何影响性能，或者是因为他们甚至不知道这些配置是否存在。。迫切需要全面的分析和性能建模，以简化MapReduce应用程序开发并指导不同Hadoop配置下的性能优化。在本文中，我们提出了一种统计分析方法来识别工作负载特征，Hadoop配置和工作负载性能之间的关系。我们将主成分分析和群集分析应用于45个不同的指标，这些指标可得出工作负载特征与不同Hadoop配置下的相应性能之间的关系。还构建了回归模型，以尝试预测不同Hadoop配置下各种工作负载的性能。通过我们的分析揭示了工作负载特征与性能之间的几种非直观关系，实验结果表明，我们的回归模型可以准确预测不同Hadoop配置下MapReduce工作负载的性能。

著录项

来源
《Journal of Grid Computing》 |2012年第2期|p.279-310|共32页
作者
Hailong Yang; Zhongzhi Luan; Wenjun Li; Depei Qian;
展开▼
作者单位

Sino-German Joint Software Institute, The State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China;

Sino-German Joint Software Institute, The State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China;

Sino-German Joint Software Institute, The State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China;

Sino-German Joint Software Institute, The State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cloud computing; Data intensive computing; MapReduce; Workload characterization; Statistical analysis; Performance prediction;

机译：云计算;数据密集型计算;MapReduce;工作量表征;统计分析;性能预测;

相似文献

外文文献
中文文献
专利

1. MapReduce workload modeling with statistical approach [J] . Yang H., Luan Z., Li W., Journal of grid computing . 2012,第2期

机译：使用统计方法的MapReduce工作负载建模
2. Analytical Performance Models for MapReduce Workloads [J] . Emanuel Vianna, Giovanni Comarela, Tatiana Pontes, International journal of parallel programming . 2013,第4期

机译：MapReduce工作负载的分析性能模型
3. The statistical analysis of multivariate failure time data: A marginal modeling approach , Ross L. Prentice , Shanshan Zhao , Boca Raton, FL : CRC Press . The statistical analysis of multivariate failure time data: A marginal modeling approach The statistical analysis of multivariate failure time data: A marginal modeling approach , Ross L. Prentice Ross L. Ross L. Prentice Prentice , Shanshan Zhao Shanshan Shanshan Zhao Zhao , Boca Raton, FL Boca Raton, FL : CRC Press CRC Press . [J] . Lin D. Y. Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2019,第4期

机译：多变量故障时间数据的统计分析：边缘建模方法，罗斯L. Prentice，山山赵，博卡拉顿，FL：CRC压力机。多元故障时间数据的统计分析：边缘建模方法多元故障时间数据的统计分析：边缘建模方法，罗斯L. Prentice Ross L. Ross L. Prentice Prentice，Shanshan Zhao Shanshan Shanshan Zhao Zhao，Boca Raton， FL BOCA RATON，FL：CRC按CRC压力机。
4. Statistics-based Workload Modeling for MapReduce [C] . Yang Hailong, Luan Zhongzhi, Li Wenjun, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops amp; PhD Forum . 2012

机译：基于统计的MapReduce工作量建模
5. Dynamic Workload Balancing and Scheduling in Hadoop MapReduce with Software Defined Networking [D] . Hou, Xiaofei. 2017

机译：Hadoop MapReduce中具有软件定义网络的动态工作负载平衡和调度
6. WESSBAS: extraction of probabilistic workload specifications for load testing and performance prediction—a model-driven approach for session-based application systems [O] . Christian Vögele, André van Hoorn, Eike Schulz, -1

机译：WESSBAS：提取概率性工作负载规范以进行负载测试和性能预测-基于模型的基于会话的应用系统的方法
7. How to measure the mental workload. Congnitive modeling as an approach to mental workload. [O] . Kenji ITOH 1993

机译：如何衡量心理工作量。认知建模作为心理工作量的方法。

MapReduce Workload Modeling with Statistical Approach

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅