首页> 外文会议>2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing. >A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment
【24h】

A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment

机译:MapReduce环境中数据密集型工作负载的混合调度算法

获取原文
获取原文并翻译 | 示例

摘要

The specific choice of workload task schedulers for Hadoop MapReduce applications can have a dramatic effect on job workload latency. The Hadoop Fair Scheduler (FairS) assigns resources to jobs such that all jobs get, on average, an equal share of resources over time. Thus, it addresses the problem with a FIFO scheduler when short jobs have to wait for long running jobs to complete. We show that even for the FairS, jobs are still forced to wait significantly when the MapReduce system assigns equal sharing of resources due to dependencies between Map, Shuffle, Sort, Reduce phases. We propose a Hybrid Scheduler (HybS) algorithm based on dynamic priority in order to reduce the latency for variable length concurrent jobs, while maintaining data locality. The dynamic priorities can accommodate multiple task lengths, job sizes, and job waiting times by applying a greedy fractional knapsack algorithm for job task processor assignment. The estimated runtime of Map and Reduce tasks are provided to the HybS dynamic priorities from the historical Hadoop log files. In addition to dynamic priority, we implement a reordering of task processor assignment to account for data availability to automatically maintain the benefits of data locality in this environment. We evaluate our approach by running concurrent workloads consisting of the Word-count and Terasort benchmarks, and a satellite scientific data processing workload and developing a simulator. Our evaluation shows the HybS system improves the average response time for the workloads approximately 2.1x faster over the Hadoop FairS with a standard deviation of 1.4x.
机译:Hadoop MapReduce应用程序的工作负载任务计划程序的特定选择可能会对作业工作负载延迟产生巨大影响。 Hadoop Fair Scheduler(FairS)将资源分配给作业,以使所有作业随时间平均获得相等的资源份额。因此,当短作业必须等待长时间运行的作业完成时,它可以解决FIFO调度程序的问题。我们显示,即使对于FairS,由于Map,Shuffle,Sort,Reduce阶段之间的依赖性,当MapReduce系统分配相等的资源共享时,作业仍被迫显着等待。我们提出一种基于动态优先级的混合调度程序(HybS)算法,以减少可变长度的并发作业的等待时间,同时保持数据局部性。通过为工作任务处理器分配应用贪婪的分数背包算法,动态优先级可以容纳多个任务长度,任务大小和任务等待时间。 Map和Reduce任务的估计运行时间将从历史Hadoop日志文件提供给HybS动态优先级。除了动态优先级,我们还对任务处理器分配进行了重新排序,以考虑到数据可用性,以自动保持此环境中数据局部性的优势。我们通过运行包含单词计数和Terasort基准测试的并发工作负载以及卫星科学数据处理工作负载并开发模拟器来评估我们的方法。我们的评估表明,与Hadoop FairS相比,HybS系统将工作负载的平均响应时间提高了约2.1倍,标准偏差为1.4倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号