Parameterizable benchmarking framework for designing a MapReduce performance model†

Zhang Zhuoyao; Cherkasova Ludmila; Loo Boon Thau

首页> 外文期刊>Concurrency and computation: practice and experience >Parameterizable benchmarking framework for designing a MapReduce performance model†

【24h】

Parameterizable benchmarking framework for designing a MapReduce performance model†

机译：用于设计MapReduce性能模型的可参数化基准测试框架†

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In MapReduce environments, many applications have to achieve different performance goals for producing time relevant results. One of typical user questions is how to estimate the completion time of a MapReduce program as a function of varying input dataset sizes and given cluster resources. In this work, we offer a novel performance evaluation framework for answering this question. We analyze the MapReduce processing pipeline and utilize the fact that the execution of map (reduce) tasks consists of specific, well-defined data processing phases. Only map and reduce functions are custom, and their executions are user-defined for different MapReduce jobs. The executions of the remaining phases are generic (i.e., defined by the MapReduce framework code) and depend on the amount of data processed by the phase and the performance of the underlying Hadoop cluster. First, we design a set of parameterizable microbenchmarks to profile the execution of generic phases and to derive a platform performance model of a given Hadoop cluster. Then, using the job past executions, we summarize job's properties and performance of its custom map/reduce functions in a compact job profile. Finally, by combining the knowledge of the job profile and the derived platform performance model, we introduce a MapReduce performance model that estimates the program completion time for processing a new dataset. The proposed benchmarking approach derives an accurate performance model of Hadoop's generic execution phases (once), and then, this model is reused for predicting the performance of different applications. The evaluation study justifies our approach and the proposed framework: We use a diverse suite of 12 MapReduce applications to validate the proposed model. The predicted completion times for most experiments are within 10% of the measured ones (with a worst case resulting in 17% of error) on our 66-node Hadoop cluster. Copyright © 2014 John Wiley & Sons, Ltd

机译：在MapReduce环境中，许多应用程序必须达到不同的性能目标才能产生与时间相关的结果。用户的典型问题之一是如何根据变化的输入数据集大小和给定的群集资源来估计MapReduce程序的完成时间。在这项工作中，我们提供了一个新颖的绩效评估框架来回答这个问题。我们分析了MapReduce处理管道，并利用了map（reduce）任务的执行包含特定的，定义明确的数据处理阶段这一事实。只有map和reduce函数是自定义的，并且它们的执行是用户为不同的MapReduce作业定义的。其余阶段的执行是通用的（即由MapReduce框架代码定义），并且取决于该阶段处理的数据量和基础Hadoop集群的性能。首先，我们设计一组可参数化的微基准测试，以描述通用阶段的执行情况，并得出给定Hadoop集群的平台性能模型。然后，使用过去执行的作业，在紧凑的作业配置文件中总结作业的属性及其自定义映射/归约功能的性能。最后，通过结合工作资料和派生的平台性能模型的知识，我们引入了 MapReduce性能模型，该模型可以估算处理新数据集的程序完成时间。提出的基准测试方法可得出Hadoop通用执行阶段（一次）的准确性能模型，然后对该模型进行重用以预测不同应用程序的性能。评估研究证明了我们的方法和建议的框架的合理性：我们使用12套MapReduce应用程序的不同套件来验证建议的模型。在我们的66节点Hadoop集群上，大多数实验的预计完成时间在实测值的10％以内（最坏的情况是导致17％的错误）。版权所有©2014 John Wiley＆Sons，Ltd 展开▼

著录项

来源
《Concurrency and computation: practice and experience》 |2014年第12期|2005-2026|共22页

作者
Zhang Zhuoyao; Cherkasova Ludmila; Loo Boon Thau;
展开▼

作者单位

University of Pennsylvania Department of Computer and Information Science Philadelphia PA USA;

Hewlett‐Packard Labs Palo Alto CA USA;

University of Pennsylvania Department of Computer and Information Science Philadelphia PA USA;

展开▼

收录信息

原文格式 PDF

正文语种 eng

中图分类

关键词
MapReduce processing pipeline; Hadoop cluster; benchmarking; job profiling; performance modeling;

机译：MapReduce处理管道;Hadoop集群;基准测试;工作概况分析;性能建模;

引文网络

参考文献

引证文献

共引文献

同被引文献

二级参考文献

二级引证文献

相似文献

外文文献

中文文献

专利

1. Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach [J] . Gandomi Abolfazl, Movaghar Ali, Reshadi Midia, Journal of supercomputing . 2020,第9期

机译：基于基准方法设计分布式异构平台的MapReduce性能模型

2. Designing Parameterizable Hardware IPs in a Model-Based Design Environment for High-Level Synthesis [J] . Butt Shahzad Ahmad, Roozmeh Mehdi, Lavagno Luciano ACM Transactions on Embedded Computing Systems . 2016,第2期

机译：在基于模型的设计环境中设计可参数化的硬件IP以进行高级综合

3. High performance parallel evolutionary algorithm model based on MapReduce framework [J] . Xin Du, Youcong Ni, Zhiqiang Yao, International Journal of Computer Applications in Technology . 2013,第3期

机译：基于MapReduce框架的高性能并行进化算法模型

4. Benchmarking and Performance studies of MapReduce/Hadoop Framework on Blue Waters Supercomputer [C] . Manisha Gajbe, Kalyana Chadalavada, Gregory Bauer, International Conference on Advances in Big Data Analytics . 2015

机译：Mapreduce / Hadoop框架上的基准性和性能研究超级计算机超级计算机

5. Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems. [D] . Rahman, Md. Wasi-ur-. 2016

机译：在现代HPC系统上设计和建模高性能MapReduce和DAG执行框架。

6. The Integrated Behavioural Model for Water Sanitation and Hygiene: a systematic review of behavioural models and a framework for designing and evaluating behaviour change interventions in infrastructure-restricted settings [O] . Robert Dreibelbis, Peter J Winch, Elli Leontsini, 2013

机译：水卫生和卫生的综合行为模型：行为模型的系统综述以及在基础设施受限的环境中设计和评估行为改变干预措施的框架

7. Benchmarking Approach for Designing a MapReduce Performance Model [O] . Zhuoyao Zhang, Ludmila Cherkasova, Boon Thau Loo 2013

机译：设计mapReduce性能模型的基准方法

1. 基于MapReduce编程模型的性能测试框架研究 [J] . 覃琳 ,宁君 . 企业科技与发展 . 2015,第005期

2. 用于Hadoop2.x的MapReduce性能评估模型 [J] . 吴岳 . 计算机系统应用 . 2021,第002期

3. 基于FAHP参数化设计方法的ARV模型构建与性能评估 [J] . 要振江 ,唐元贵 ,王丙乾 . 海洋技术 . 2018,第001期

4. 基于FAHP参数化设计方法的ARV模型构建与性能评估 [J] . 要振江 ,唐元贵 ,王丙乾 . 海洋技术学报 . 2018,第001期

5. 基于FAHP参数化设计方法的ARV模型构建与性能评估 [J] . 要振江12 ,唐元贵1 ,王丙乾12 . 海洋技术学报 . 2018,第001期

6. 基于全参数化模型的车身结构设计、性能评估及优化 [C] . Qiu Zhongcai ,邱忠财 ,Liu Bo . 2015中国汽车工程学会年会 . 2015

7. 用于超高速A/D转换器的可编程高性能基准源设计 [A] . 丁大胜 . 2013

1. 一种用于车身结构参数化设计的近似模型建立方法 [P] . 中国专利： CN113919114A . 2022-01-11

2. 用于设计具有用于多阵列交叉方向(CD)幅材制造或处理系统或其他系统的时间上稳健的稳定性和性能的基于模型的控制的方法和装置 [P] . 中国专利： CN110637261A . 2019-12-31

3. Re-sizing data partitions for ensemble models in a mapreduce framework [P] . 外国专利： US10459934B2 . 2019-10-29

机译：在mapreduce框架中调整整体模型的数据分区的大小

4. Re-sizing data partitions for ensemble models in a mapreduce framework [P] . 外国专利： US9798782B2 . 2017-10-24

机译：在mapreduce框架中调整整体模型的数据分区的大小

5. RE-SIZING DATA PARTITIONS FOR ENSEMBLE MODELS IN A MAPREDUCE FRAMEWORK [P] . 外国专利： US2015356148A1 . 2015-12-10

机译：缩小映射框架中可划分模型的数据分区的大小

相关主题

Parameterizable benchmarking framework for designing a MapReduce performance model†

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅