A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment

机译：MapReduce环境中数据密集型工作负载的混合调度算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The specific choice of workload task schedulers for Hadoop MapReduce applications can have a dramatic effect on job workload latency. The Hadoop Fair Scheduler (FairS) assigns resources to jobs such that all jobs get, on average, an equal share of resources over time. Thus, it addresses the problem with a FIFO scheduler when short jobs have to wait for long running jobs to complete. We show that even for the FairS, jobs are still forced to wait significantly when the MapReduce system assigns equal sharing of resources due to dependencies between Map, Shuffle, Sort, Reduce phases. We propose a Hybrid Scheduler (HybS) algorithm based on dynamic priority in order to reduce the latency for variable length concurrent jobs, while maintaining data locality. The dynamic priorities can accommodate multiple task lengths, job sizes, and job waiting times by applying a greedy fractional knapsack algorithm for job task processor assignment. The estimated runtime of Map and Reduce tasks are provided to the HybS dynamic priorities from the historical Hadoop log files. In addition to dynamic priority, we implement a reordering of task processor assignment to account for data availability to automatically maintain the benefits of data locality in this environment. We evaluate our approach by running concurrent workloads consisting of the Word-count and Terasort benchmarks, and a satellite scientific data processing workload and developing a simulator. Our evaluation shows the HybS system improves the average response time for the workloads approximately 2.1x faster over the Hadoop FairS with a standard deviation of 1.4x.

机译：Hadoop MapReduce应用程序的工作负载任务计划程序的特定选择可能会对作业工作负载延迟产生巨大影响。 Hadoop Fair Scheduler（FairS）将资源分配给作业，以使所有作业随时间平均获得相等的资源份额。因此，当短作业必须等待长时间运行的作业完成时，它可以解决FIFO调度程序的问题。我们显示，即使对于FairS，由于Map，Shuffle，Sort，Reduce阶段之间的依赖性，当MapReduce系统分配相等的资源共享时，作业仍被迫显着等待。我们提出一种基于动态优先级的混合调度程序（HybS）算法，以减少可变长度的并发作业的等待时间，同时保持数据局部性。通过为工作任务处理器分配应用贪婪的分数背包算法，动态优先级可以容纳多个任务长度，任务大小和任务等待时间。 Map和Reduce任务的估计运行时间将从历史Hadoop日志文件提供给HybS动态优先级。除了动态优先级，我们还对任务处理器分配进行了重新排序，以考虑到数据可用性，以自动保持此环境中数据局部性的优势。我们通过运行包含单词计数和Terasort基准测试的并发工作负载以及卫星科学数据处理工作负载并开发模拟器来评估我们的方法。我们的评估表明，与Hadoop FairS相比，HybS系统将工作负载的平均响应时间提高了约2.1倍，标准偏差为1.4倍。

著录项

来源
《2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing.》|2012年|p.161-167|共7页
会议地点 Chicago IL(US);Chicago IL(US)
作者
Nguyen Phuong; Simon Tyler; Halem Milton; Chapman David; Le Quang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类一般计算器和计算机;一般计算器和计算机;
关键词
Hadoop; MapReduce; Scheduler; dynamic priority; scheduling; workflow;

机译：Hadoop; MapReduce; Scheduler;动态优先级;调度;工作流;;

相似文献

外文文献
中文文献
专利

1. Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds [J] . B.Thirumala Rao, L.S.S.Reddy International journal of computer science and network security . 2013,第6期

机译：通过基于MapReduce的云上的虚拟化计划数据密集型工作负载
2. Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds [J] . B.Thirumala Rao, L.S.S.Reddy International journal of computer science and network security . 2013,第6期

机译：在基于MapReduce的云上通过虚拟化计划数据密集型工作负载
3. Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds [ [J] . B.Thirumala Rao, L.S.S.Reddy International Journal of Distributed and Parallel Systems . 2012,第4期

机译：在基于MapReduce的云上通过虚拟化计划数据密集型工作负载[
4. A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment [C] . Nguyen Phuong, Simon Tyler, Halem Milton, IEEE International Conference on Utility and Cloud Computing;International Workshop on Clouds and Applications Management;International Workshop on Intelligent Techniques and Architectures for Autonomic Clouds;International Workshop on Green and Cloud Computing Management . 2012

机译：MapReduce环境中数据密集工作负载的混合调度算法
5. Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments. [D] . Gunarathne, Thilina. 2014

机译：云上的可伸缩并行计算：高效且可伸缩的架构，可在云环境上执行令人满意的并行，MapReduce和迭代式数据密集型计算。
6. Correction: Hybrid Symbiotic Organisms Search Optimization Algorithm for Scheduling of Tasks on Cloud Computing Environment [O] . Mohammed Abdullahi, Md Asri Ngadi 2011

机译：更正：用于云计算环境中任务调度的混合共生生物搜索优化算法
7. Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds [O] . Rao, B. Thirumala, Reddy, L. S. S. 2012

机译：通过mapReduce上的虚拟化调度数据密集型工作负载基于云

A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment

摘要

著录项

相似文献

相关主题

期刊订阅