Automating Platform Selection for MapReduce Processing in the Cloud

机译：云中MapReduce处理的自动化平台选择

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cloud computing enables a user to quickly provision any desirable size Hadoop cluster and then pay for the time these resources were used. With the same budget, a user can rent a larger amount of resources and process its scale-out application in a shorter time, or rent a smaller size cluster but pay a for longer processing time. Moreover, there is a variety of different types of VM instances in the Cloud (e.g., small, medium, or large EC2 instances). The capacity differences of the offered VMs are reflected in VM's pricing. Therefore, again for the same price a user can get a variety of "similar capacity" Hadoop clusters based on different VM instance types. We observe that performance of MapReduce applications may vary significantly on different platforms. This makes a selection of the best cost/performance platform for a given workload a non-trivial problem, especially when it contains multiple jobs with different platform preferences. In this work, we design a framework for solving the following problem: given a completion time target for a set of MapReduce jobs, determine a homogeneous or heterogeneous Hadoop cluster configuration (i.e., the number, types of VMs, and the job schedule) for processing these jobs within a given deadline while minimizing the rented infrastructure cost. We generalize the proposed framework to take into account possible node failures and degraded performance goals. Our evaluation study with Amazon EC2 platform reveals that for different workload mixes, an optimized platform choice may result in 45-68% cost savings for achieving the same performance objectives when using different (but seemingly equivalent) choices. Moreover, depending on a workload the heterogeneous solution may outperform the homogeneous cluster solution by 26-42%. We analyze and discuss possible causes for observed performance differences of MapReduce processing on the Amazon EC2 platforms.

机译：云计算使用户能够快速配置任何所需大小的Hadoop集群，然后为使用这些资源的时间付费。使用相同的预算，用户可以租用大量资源并在更短的时间内处理其横向扩展应用程序，或者租用规模较小的群集，但需要支付较长的处理时间。此外，云中有多种不同类型的VM实例（例如，小型，中型或大型EC2实例）。提供的虚拟机的容量差异反映在虚拟机的定价中。因此，再次以相同的价格，用户可以基于不同的VM实例类型获得各种“相似容量”的Hadoop集群。我们观察到，MapReduce应用程序的性能在不同平台上可能会有很大差异。这使得针对给定工作负载选择最佳成本/性能平台成为一个不小的问题，尤其是当它包含具有不同平台首选项的多个作业时。在这项工作中，我们设计了一个框架来解决以下问题：给定一组MapReduce作业的完成时间目标，确定用于以下任务的同构或异构Hadoop集群配置（即VM的数量，类型和作业计划）在给定的期限内处理这些工作，同时将租赁的基础架构成本降至最低。我们对建议的框架进行了概括，以考虑到可能的节点故障和性能下降的目标。我们对Amazon EC2平台的评估研究表明，对于不同的工作负载混合，使用不同（但看似等效）的选择来实现相同的性能目标，优化平台选择可能会节省45-68％的成本。此外，根据工作负载，异构解决方案的性能可能比同类群集解决方案好26-42％。我们分析并讨论了在Amazon EC2平台上观察到的MapReduce处理性能差异的可能原因。

著录项

来源
《International Conference on Cloud and Autonomic Computing》|2015年|125-136|共12页
会议地点
作者
Zhuoyao Zhang; Cherkasova Ludmila; Boon Thau Loo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
cloud computing; parallel processing; pattern clustering; scheduling; Amazon EC2 platform; MapReduce applications; MapReduce jobs; MapReduce processing; VM instances; VM pricing; cloud computing; cost savings; heterogeneous Hadoop cluster configuration; heterogeneous solution; homogeneous Hadoop cluster configuration; homogeneous cluster solution; infrastructure cost; job schedule processing; node failures; platform selection automation; workload mixes; Cloud computing; Clustering algorithms; Fault tolerance; Fault tolerant systems; Random access memory; Schedules; Upper bound; Amazon EC2; Heterogeneous Hadoop Clusters; MapReduce; optimized platform choice; performance; simulation;

机译：云计算;并行处理;模式集群;调度; Amazon EC2平台; MapReduce应用程序; MapReduce作业; MapReduce处理; VM实例; VM定价;云计算;节省成本;异构Hadoop集群配置;异构解决方案;异构Hadoop集群配置;异构集群解决方案;基础架构成本;作业计划处理;节点故障;平台选择自动化;工作量混合;云计算;集群算法;容错;容错系统;随机访问内存;计划;上限; Amazon EC2;异构Hadoop集群; MapReduce优化平台选择性能仿真;

相似文献

外文文献
中文文献
专利

1. A Cloud-based Platform for Automated Order Processing in Additive Manufacturing [J] . Jan-Peer Rudolph, Claus Emmelmann Procedia CIRP . 2017,第1期

机译：基于云的增材制造自动化订单处理平台
2. Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms [J] . Qiang Liu, Tim Todman, Wayne Luk, Journal of signal processing systems for signal, image, and video technology . 2012,第1期

机译：MapReduce模式到并行计算平台上的自动映射
3. Inference of Large-scale Time-delayed Gene Regulatory Network with Parallel MapReduce Cloud Platform [J] . Bin Yang, Wenzheng Bao, De-Shuang Huang, Scientific reports. . 2018,第1期

机译：利用并行MapReduce云平台推断大规模时延基因调控网络
4. Automating Platform Selection for MapReduce Processing in the Cloud [C] . Zhuoyao Zhang, Cherkasova Ludmila, Boon Thau Loo International Conference on Cloud and Autonomic Computing . 2015

机译：自动化云中MapReduce处理的平台选择
5. The Adoption Process for Cloud Computing Architecture Platforms [D] . Rush, Joseph . 2020

机译：云计算架构平台的采用过程
6. Cloudgene: A graphical execution platform for MapReduce programs on private and public clouds [O] . Sebastian Schönherr, Lukas Forer, Hansi Weißensteiner, 2012

机译：Cloudgene：用于私有和公共云上的MapReduce程序的图形执行平台
7. A Cloud-based Platform for Automated Order Processing in Additive Manufacturing [O] . Jan-Peer Rudolph, Claus Emmelmann 2017

机译：基于云的加性制造中的自动化订单处理平台

Automating Platform Selection for MapReduce Processing in the Cloud

摘要

著录项

相似文献

相关主题

期刊订阅