首页> 外文学位 >Performance modeling and resource management for MapReduce applications.
【24h】

Performance modeling and resource management for MapReduce applications.

机译:MapReduce应用程序的性能建模和资源管理。

获取原文
获取原文并翻译 | 示例

摘要

Big Data analytics is increasingly performed using the MapReduce paradigm and its open-source implementation Hadoop as a platform choice. Many applications associated with live business intelligence are written as complex data analysis programs defined by directed acyclic graphs of MapReduce jobs. An increasing number of these applications have additional requirements for completion time guarantees. The advent of cloud computing brings a competitive alternative solution for data analytic problems while it also introduces new challenges in provisioning clusters that provide best cost-performance trade-offs.;In this dissertation, we aim to develop a performance evaluation framework that enables automatic resource management for MapReduce applications in achieving different optimization goals. It consists of the following components: (1) a performance modeling framework that estimates the completion time of a given MapReduce application when executed on a Hadoop cluster according to its input data sets, the job settings and the amount of allocated resources for processing it; (2) a resource allocation strategy for deadline-driven MapReduce applications that automatically tailors and controls the resource allocation on a shared Hadoop cluster to different applications to achieve their (soft) deadlines; (3) a simulator-based solution to the resource provision problem in public cloud environment that guides the users to determine the types and amount of resources that should lease from the service provider for achieving different goals; (4) an optimization strategy to automatically determine the optimal job settings within a MapReduce application for efficient execution and resource usage. We validate the accuracy, efficiency, and performance benefits of the proposed framework using a set of realistic MapReduce applications on both private cluster and public cloud environment.
机译:使用MapReduce范式及其开源实现Hadoop作为平台选择,越来越多地执行大数据分析。与实时商业智能相关的许多应用程序都被编写为复杂的数据分析程序,该程序由MapReduce作业的有向无环图定义。越来越多的此类应用程序对完成时间保证有其他要求。云计算的问世为数据分析问题带来了一种竞争性的替代解决方案,同时也给配置集群带来了新的挑战,这些集群提供了最佳的成本-性能折衷。本文,我们旨在开发一种性能评估框架,该框架可以实现自动资源管理。 MapReduce应用程序的管理以实现不同的优化目标。它由以下组件组成:(1)一个性能建模框架,用于根据给定的MapReduce应用程序在Hadoop集群上执行时,根据其输入数据集,作业设置以及为处理它而分配的资源量,来估计其完成时间; (2)针对期限驱动的MapReduce应用程序的资源分配策略,该策略可自动调整和控制共享Hadoop集群上针对不同应用程序的资源分配,以实现其(软)期限; (3)基于模拟器的公共云环境中资源供应问题的解决方案,指导用户确定应从服务提供商那里租借的资源类型和数量,以实现不同的目标; (4)一种优化策略,可自动确定MapReduce应用程序中的最佳作业设置,以实现高效执行和资源使用。我们在私有集群和公共云环境上使用一组实际的MapReduce应用程序来验证所提出框架的准确性,效率和性能优势。

著录项

  • 作者

    Zhang, Zhuoyao.;

  • 作者单位

    University of Pennsylvania.;

  • 授予单位 University of Pennsylvania.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 170 p.
  • 总页数 170
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号