首页> 外文学位 >Evaluating MapReduce System Performance: A Simulation Approach.
【24h】

Evaluating MapReduce System Performance: A Simulation Approach.

机译:评估MapReduce系统性能:一种仿真方法。

获取原文
获取原文并翻译 | 示例

摘要

Scale of data generated and processed is exploding in the Big Data era. The MapReduce system popularized by open-source Hadoop is a powerful tool for the exploding data problem, and is widely employed in many areas involving large scale of data. In many circumstances, hypothetical MapReduce systems must be evaluated, e.g. to provision a new MapReduce system to provide certain performance goal, to upgrade a currently running system to meet increasing business demands, to evaluate novel network topology, new scheduling algorithms, or resource arrangement schemes. The traditional trial-and-error solution involves the time- consuming and costly process in which a real cluster is first built and then benchmarked. In this dissertation, we propose to simulate MapReduce systems and evaluate hypothetical MapReduce systems using simulation. This simulation approach offers significantly lower turn-around time and lower cost than experiments. Simulation cannot entirely replace experiments, but can be used as a preliminary step to reveal potential flaws and gain critical insights.;We studied MapReduce systems in detail and developed a comprehensive performance model for MapReduce, including sub-task phase level performance models for both map and reduce tasks and a model for resource contention between multiple processes running in concurrent. Based on the performance model, we developed a comprehensive simulator for MapReduce, MRPerf. MRPerf is the first full-featured MapReduce simulator. It supports both workload simulation and resource contention, and it still offers the most complete features among all MapReduce simulators to date. Using MRPerf, we conducted two case studies to evaluate scheduling algorithms in MapReduce and shared storage in MapReduce, without building real clusters.;Furthermore, in order to further integrate simulation and performance prediction into Map- Reduce systems and leverage predictions to improve system performance, we developed on- line prediction framework for MapReduce, which periodically runs simulations within a live Hadoop MapReduce system. The framework can predict task execution within a window in near future. These predictions can be used by other components in MapReduce systems in order to improve performance. Our results show that the framework can achieve high prediction accuracy and incurs negligible overhead. We present two potential use cases, prefetching and dynamic adapting scheduler.
机译:在大数据时代,生成和处理的数据规模呈爆炸式增长。开源Hadoop普及的MapReduce系统是解决爆炸性数据问题的强大工具,并广泛用于涉及大规模数据的许多领域。在许多情况下,必须对假设的MapReduce系统进行评估,例如提供新的MapReduce系统以提供一定的性能目标,升级当前运行的系统以满足日益增长的业务需求,评估新颖的网络拓扑,新的调度算法或资源安排方案。传统的试错解决方案涉及耗时且成本高昂的过程,在该过程中,首先构建真实集群,然后进行基准测试。本文提出了对MapReduce系统进行仿真的方法,并通过仿真对假设的MapReduce系统进行了评估。与实验相比,这种仿真方法可大大缩短周转时间并降低成本。模拟不能完全替代实验,但可以用作揭示潜在缺陷和获得关键见解的第一步。我们详细研究了MapReduce系统,并开发了MapReduce的综合性能模型,包括两个地图的子任务阶段级性能模型减少并发运行的多个进程之间的任务和资源争用模型。基于性能模型,我们为MapReduce,MRPerf开发了一个全面的模拟器。 MRPerf是第一个功能齐全的MapReduce模拟器。它支持工作负载模拟和资源争用,并且仍然提供迄今为止所有MapReduce模拟器中最完整的功能。我们使用MRPerf进行了两个案例研究,以评估MapReduce中的调度算法和MapReduce中的共享存储,而无需构建实际的集群。此外,为了将仿真和性能预测进一步集成到Map-Reduce系统中,并利用预测来提高系统性能,我们为MapReduce开发了在线预测框架,该框架可在实时Hadoop MapReduce系统中定期运行仿真。该框架可以在不久的将来预测窗口内的任务执行。这些预测可以由MapReduce系统中的其他组件使用,以提高性能。我们的结果表明,该框架可以实现较高的预测精度,并产生可忽略的开销。我们提出了两个潜在的用例,预取和动态适应调度程序。

著录项

  • 作者

    Wang, Guanying.;

  • 作者单位

    Virginia Polytechnic Institute and State University.;

  • 授予单位 Virginia Polytechnic Institute and State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 108 p.
  • 总页数 108
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号