首页> 外文期刊>Journal of supercomputing >Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment
【24h】

Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment

机译:基于动态排名的MapReduce作业调度程序,可在虚拟化环境中利用异构性能

获取原文
获取原文并翻译 | 示例

摘要

"More data, more information." Big data helps businesses and research communities to gain insights and increase productivity. Many public cloud service providers offer Hadoop MapReduce as a service based on pay-per-use via infrastructure as a service on clusters of virtual machines promising on-demand horizontal scaling. These clusters of virtual machines are launched in various physical machines across racks in cloud data centers. Such multi-tenancy negatively introduces performance heterogeneity for Hadoop virtual machines due to hardware heterogeneity and interference from co-located virtual machine. Performance heterogeneity largely affects MapReduce job latency and resource utilization of rented Hadoop virtual clusters. Default MapReduce schedulers assign map/reduce tasks assuming the hardware is homogeneous. Interference-aware schedulers perform by only observing the interference pattern generated by co-located virtual machines. These schedulers do not consider the heterogeneous performance of virtual machines.Therefore, we propose a dynamic ranking-based MapReduce job scheduler that places the map and reduces tasks based on a virtual machine's performance rank to minimize job latency and improve resource utilization. Our proposed approach calculates the performance score for each virtual machine based on hardware heterogeneity and co-located virtual machine interference. Then, it ranks the virtual machines based on the map and reduce performance separately to place map and reduce tasks. To demonstrate our ideas, we have set a test bed with 29 virtual machines on eight physical machines with different configurations and capacities. We modify a default fair scheduler in Hadoop 2.x to incorporate our ideas and evaluate them with different workloads on the PUMA dataset. The proposed method is then compared against a default fair scheduler (resource-aware) and an interference-aware scheduler based on job latency and resource utilization. Finally, we argue in favor of our approach as it improves resource utilization by 30-65% and overall job latency by up to 30%.
机译:“更多数据,更多信息。”大数据可帮助企业和研究社区获得见解并提高生产力。许多公共云服务提供商都将Hadoop MapReduce作为一种服务,它通过基于基础设施的按使用量付费作为虚拟机群集上的服务,保证按需按需水平扩展。这些虚拟机群集在云数据中心内跨机架的各种物理机中启动。由于硬件异构性和来自同一位置的虚拟机的干扰,这样的多租户负面影响了Hadoop虚拟机的性能异构性。性能异质性在很大程度上影响MapReduce作业延迟和租用的Hadoop虚拟集群的资源利用率。假设硬件是同质的,默认的MapReduce调度程序会分配映射/还原任务。感知干扰的调度程序仅通过观察位于同一位置的虚拟机生成的干扰模式来执行。这些调度程序没有考虑虚拟机的异构性能,因此,我们提出了一种基于动态排名的MapReduce作业调度程序,该调度程序根据虚拟机的性能等级放置地图并减少任务,以最大程度地减少作业延迟并提高资源利用率。我们提出的方法基于硬件异构性和位于同一地点的虚拟机干扰来计算每个虚拟机的性能得分。然后,它根据地图对虚拟机进行排名,并分别降低性能以放置地图和减少任务。为了展示我们的想法,我们在8个具有不同配置和容量的物理机上设置了29个虚拟机的测试台。我们在Hadoop 2.x中修改了默认的公平调度程序,以合并我们的想法,并在PUMA数据集上用不同的工作负载评估它们。然后根据作业等待时间和资源利用率,将建议的方法与默认公平调度器(资源感知)和干扰感知调度器进行比较。最后,我们支持我们的方法,因为它可以将资源利用率提高30-65%,并将整体工作延迟提高30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号