首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Deadline-Aware MapReduce Job Scheduling with Dynamic Resource Availability
【24h】

Deadline-Aware MapReduce Job Scheduling with Dynamic Resource Availability

机译:具有动态资源可用性的可感知截止日期的MapReduce作业调度

获取原文
获取原文并翻译 | 示例
           

摘要

As MapReduce is becoming ubiquitous in large-scale data analysis, many recent studies have shown that the performance of MapReduce could be improved by different job scheduling approaches, e.g., Fair Scheduler and Capacity Scheduler. However, most exiting MapReduce job schedulers focus on the scenario that MapReduce cluster is stable and pay little attention to the MapReduce cluster with dynamic resource availability. In fact, MapReduce cluster resources may fluctuate as there is a growing number of Hadoop clusters deployed on hybrid systems, e.g., infrastructure powered by mix of traditional and renewable energy, and cloud platforms hosting heterogeneous workloads. Thus, there is a growing need for providing predictable services to users who have strict requirements on job completion times in such dynamic environments. In this paper, we propose, RDS, a Resource and Deadline-aware Hadoop job Scheduler that takes future resource availability into consideration when minimizing job deadline misses. We formulate the job scheduling problem as an online optimization problem and solve it using an efficient receding horizon control algorithm. To aid the control, we design a self-learning model to estimate job completion times. We further extend the design of RDS scheduler to support flexible performance goals in various dynamic clusters. In particular, we use flexible deadline time bounds instead of the single fixed job completion deadline. We have implemented RDS in the open-source Hadoop implementation and performed evaluations with various benchmark workloads. Experimental results show that RDS substantially reduces the penalty of deadline misses by at least 36 and 10 percent compared with Fair Scheduler and Earliest Deadline First (EDF) scheduler, respectively. In a Hadoop cluster running partially on renewable energy, the experimental result shows the green power based resource prediction approach can further reduce the penalty of deadline misses by 16 percent compared to Auto-Regressive Integrated Moving Average (ARIMA) prediction approach.
机译:随着MapReduce在大规模数据分析中变得无处不在,许多最新研究表明,可以通过不同的作业调度方法(例如Fair Scheduler和Capacity Scheduler)来提高MapReduce的性能。但是,大多数现有的MapReduce作业调度程序都将重点放在MapReduce群集稳定的场景上,而很少关注具有动态资源可用性的MapReduce群集。实际上,随着越来越多的Hadoop集群部署在混合系统上(例如,由传统能源和可再生能源混合提供动力的基础架构以及托管异构工作负载的云平台),MapReduce集群资源可能会发生波动。因此,越来越需要在这种动态环境中向对作业完成时间有严格要求的用户提供可预测的服务。在本文中,我们提出了RDS,一种资源和期限感知的Hadoop作业调度程序,该任务调度程序在最小化作业期限错失时考虑了将来的资源可用性。我们将作业调度问题表述为在线优化问题,并使用有效的后退水平控制算法对其进行求解。为了帮助控制,我们设计了一个自学习模型来估计作业完成时间。我们进一步扩展了RDS调度程序的设计,以支持各种动态集群中的灵活性能目标。特别是,我们使用灵活的截止期限而不是单个固定的工作完成期限。我们已经在开源Hadoop实施中实现了RDS,并使用各种基准工作负载进行了评估。实验结果表明,与公平调度程序和最早截止日期优先(EDF)调度程序相比,RDS分别将截止日期丢失的惩罚减少了至少36%和10%。在部分使用可再生能源运行的Hadoop集群中,实验结果表明,与自动回归综合移动平均值(ARIMA)预测方法相比,基于绿色能源的资源预测方法可以进一步将截止期限错失的损失降低16%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号