首页> 外文期刊>Cloud Computing, IEEE Transactions on >New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters
【24h】

New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters

机译:用于提高Hadoop纱线集群性能和资源利用的新调度算法

获取原文
获取原文并翻译 | 示例
           

摘要

The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured data processing in recent years. The Hadoop ecosystem has evolved into its second generation, Hadoop YARN, which adopts fine-grained resource management schemes for job scheduling. Nowadays, fairness and efficiency are two main concerns in YARN resource management because resources in YARN are shared and contended by multiple applications. However, the current scheduling in YARN does not yield the optimal resource arrangement, unnecessarily causing idle resources and inefficient scheduling. It omits the dependency between tasks which is extremely crucial for the efficiency of resource utilization as well as heterogeneous job features in real application environments. We thus propose a new YARN scheduler which can effectively reduce the makespan (i.e., the total execution time) of a batch of MapReduce jobs in Hadoop YARN clusters by leveraging the information of requested resources, resource capacities and dependency between tasks. For accommodating heterogeneity in MapReduce jobs, we also extend our scheduler by further considering the job iteration information in the scheduling decisions. We implemented the new scheduling algorithm as a pluggable scheduler in YARN and evaluated it with a set of classic MapReduce benchmarks. The experimental results demonstrate that our YARN scheduler effectively reduces the makespans and improves resource utilizations.
机译:MapReduce框架已成为近年来可扩展的半结构化和无结构化数据处理的Defacto方案。 Hadoop生态系统已经进化到其第二代Hadoop纱线,该纱线采用了对工作调度的细粒度资源管理计划。如今,公平性和效率是纱线资源管理中的两个主要问题,因为纱线中的资源是由多种应用共享和争辩的。然而,纱线的当前调度不产生最佳资源布置,不必要地引起空闲资源和低效调度。它省略了任务之间的依赖性,这对于资源利用效率以及实际应用环境中的异构作业特征非常重要。因此,我们提出了一种新的纱线调度程序,可以通过利用所要求的资源,资源能力和任务之间的依赖性的信息,有效地减少Hadoop纱集群中的批量MapReduce作业的MapEspan(即,总执行时间)。为了适应MapReduce作业的异质性,我们还通过进一步考虑调度决策中的求职信息来扩展我们的调度程序。我们在纱线中实现了新的调度算法作为可插拔的调度程序,并使用一组Classic MapReduce基准进行评估。实验结果表明,我们的纱线调度器有效地减少了MakEspans并提高了资源利用。

著录项

  • 来源
    《Cloud Computing, IEEE Transactions on》 |2021年第3期|1158-1171|共14页
  • 作者单位

    Northeastern Univ Dept Elect & Comp Engn 360 Huntington Ave Boston MA 02115 USA;

    Northeastern Univ Dept Elect & Comp Engn 360 Huntington Ave Boston MA 02115 USA;

    Montclair State Univ Dept Comp Sci 1 Normal Ave Montclair NJ 07043 USA;

    Univ Massachusetts Dept Comp Sci 100 Morrissey Blvd Boston MA 02125 USA;

    Northeastern Univ Dept Elect & Comp Engn 360 Huntington Ave Boston MA 02115 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    MapReduce; Resource Management; YARN; Data Processing;

    机译:MapReduce;资源管理;纱线;数据处理;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号