首页> 外文期刊>Concurrency, practice and experience >Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment
【24h】

Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment

机译:改进异构环境中异构工作负载的MapReduce调度程序

获取原文
获取原文并翻译 | 示例

摘要

Big data is largely influencing business entities and research sectors to be more data-driven. Hadoop MapReduce is one of the cost-effective ways to process large scale datasets and offered as a service over the Internet. Even though cloud service providers promise an infinite amount of resources available on-demand, it is inevitable that some of the hired virtual resources for MapReduce are left unutilized and makespan is limited due to various heterogeneities that exist while offering MapReduce as a service. As MapReduce v2 allows users to define the size of containers for the map and reduce tasks, jobs in a batch become heterogeneous and behave differently. Also, the different capacity of virtual machines in the MapReduce virtual cluster accommodate a varying number of map/reduce tasks. These factors highly affect resource utilization in the virtual cluster and the makespan for a batch of MapReduce jobs. Default MapReduce job schedulers do not consider these heterogeneities that exist in a cloud environment. Moreover, virtual machines in MapReduce virtual cluster process an equal number of blocks regardless of their capacity, which affects the makespan. Therefore, we devised a heuristic-based MapReduce job scheduler that exploits virtual machine and MapReduce workload level heterogeneities to improve resource utilization and makespan. We proposed two methods to achieve this: (i) roulette wheel scheme based data block placement in heterogeneous virtual machines, and (ii) a constrained 2-dimensional bin packing to place heterogeneous map/reduce tasks. We compared heuristic-based MapReduce job scheduler against the classical fair scheduler in MapReduce v2. Experimental results showed that our proposed scheduler improved makespan and resource utilization by 45.6% and 47.9% over classical fair scheduler.
机译:大数据正在很大程度上影响业务实体和研究部门,使其更受数据驱动。 Hadoop MapReduce是处理大规模数据集的一种经济有效的方法,并通过Internet作为服务提供。即使云服务提供商承诺按需提供无限量的资源,但不可避免的是,由于在提供MapReduce服务时存在各种异质性,因此MapReduce的某些租用虚拟资源仍未得到利用,而makepan受到限制。由于MapReduce v2允许用户定义地图的容器大小并减少任务,因此批处理中的作业变得异构并且表现不同。此外,MapReduce虚拟群集中虚拟机的不同容量可容纳数量不等的映射/还原任务。这些因素严重影响虚拟群集中的资源利用率以及一批MapReduce作业的有效期。默认的MapReduce作业调度程序不考虑云环境中存在的这些异构性。此外,MapReduce虚拟集群中的虚拟机无论处理多少块,都处理相等数量的块,这会影响有效期。因此,我们设计了一种基于启发式的MapReduce作业调度程序,该调度程序利用虚拟机和MapReduce工作负载级别的异构性来提高资源利用率和有效期。我们提出了两种方法来实现此目的:(i)基于轮盘方案的数据块在异构虚拟机中的放置,以及(ii)受约束的二维装箱以放置异构地图/约简任务。我们将基于启发式的MapReduce作业计划程序与MapReduce v2中的经典公平计划程序进行了比较。实验结果表明,与传统的公平调度程序相比,我们提出的调度程序将制造时间和资源利用率提高了45.6%和47.9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号