首页> 外文会议>International Conference on Cloud and Autonomic Computing >Automating Platform Selection for MapReduce Processing in the Cloud
【24h】

Automating Platform Selection for MapReduce Processing in the Cloud

机译:云中MapReduce处理的自动化平台选择

获取原文

摘要

Cloud computing enables a user to quickly provision any desirable size Hadoop cluster and then pay for the time these resources were used. With the same budget, a user can rent a larger amount of resources and process its scale-out application in a shorter time, or rent a smaller size cluster but pay a for longer processing time. Moreover, there is a variety of different types of VM instances in the Cloud (e.g., small, medium, or large EC2 instances). The capacity differences of the offered VMs are reflected in VM's pricing. Therefore, again for the same price a user can get a variety of "similar capacity" Hadoop clusters based on different VM instance types. We observe that performance of MapReduce applications may vary significantly on different platforms. This makes a selection of the best cost/performance platform for a given workload a non-trivial problem, especially when it contains multiple jobs with different platform preferences. In this work, we design a framework for solving the following problem: given a completion time target for a set of MapReduce jobs, determine a homogeneous or heterogeneous Hadoop cluster configuration (i.e., the number, types of VMs, and the job schedule) for processing these jobs within a given deadline while minimizing the rented infrastructure cost. We generalize the proposed framework to take into account possible node failures and degraded performance goals. Our evaluation study with Amazon EC2 platform reveals that for different workload mixes, an optimized platform choice may result in 45-68% cost savings for achieving the same performance objectives when using different (but seemingly equivalent) choices. Moreover, depending on a workload the heterogeneous solution may outperform the homogeneous cluster solution by 26-42%. We analyze and discuss possible causes for observed performance differences of MapReduce processing on the Amazon EC2 platforms.
机译:云计算使用户能够快速配置任何所需大小的Hadoop集群,然后为使用这些资源的时间付费。使用相同的预算,用户可以租用大量资源并在更短的时间内处理其横向扩展应用程序,或者租用规模较小的群集,但需要支付较长的处理时间。此外,云中有多种不同类型的VM实例(例如,小型,中型或大型EC2实例)。提供的虚拟机的容量差异反映在虚拟机的定价中。因此,再次以相同的价格,用户可以基于不同的VM实例类型获得各种“相似容量”的Hadoop集群。我们观察到,MapReduce应用程序的性能在不同平台上可能会有很大差异。这使得针对给定工作负载选择最佳成本/性能平台成为一个不小的问题,尤其是当它包含具有不同平台首选项的多个作业时。在这项工作中,我们设计了一个框架来解决以下问题:给定一组MapReduce作业的完成时间目标,确定用于以下任务的同构或异构Hadoop集群配置(即VM的数量,类型和作业计划)在给定的期限内处理这些工作,同时将租赁的基础架构成本降至最低。我们对建议的框架进行了概括,以考虑到可能的节点故障和性能下降的目标。我们对Amazon EC2平台的评估研究表明,对于不同的工作负载混合,使用不同(但看似等效)的选择来实现相同的性能目标,优化平台选择可能会节省45-68%的成本。此外,根据工作负载,异构解决方案的性能可能比同类群集解决方案好26-42%。我们分析并讨论了在Amazon EC2平台上观察到的MapReduce处理性能差异的可能原因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号