首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Energy-Efficient Task Scheduling for CPU-Intensive Streaming Jobs on Hadoop
【24h】

Energy-Efficient Task Scheduling for CPU-Intensive Streaming Jobs on Hadoop

机译:Hadoop上CPU密集型流作业的节能任务调度

获取原文
获取原文并翻译 | 示例

摘要

Hadoop, especially Hadoop 2.0, has been a dominant framework for real-time big data processing. However, Hadoop is not optimized for energy efficiency. Aiming to solve this problem, in this paper, we propose a new framework to improve the energy efficiency of Hadoop 2.0. We focus on the resource manager in Hadoop 2.0, namely YARN, and propose energy-efficient task scheduling mechanisms on YARN. Particularly, we focus on CPU-intensive streaming jobs and classify streaming jobs into two types, namely batch streaming jobs ( i.e., a set of jobs are submitted simultaneously) and online streaming jobs ( i.e., jobs are continuously submitted one by one). We devise different energy-efficient task scheduling algorithms for each kind of streaming jobs. Specially, we first propose to abstractly model performance and energy consumption by considering the characteristics of tasks as well as the computational resources in YARN. Based on this model, we study the energy efficiency of streaming tasks which consist of the performance model and energy consumption model of task. We propose two key principles for improving energy efficiency: 1) CPU usage aware task allocation, partitions tasks to NMs based on the task characteristic in term of CPU usage; and 2) resource efficient task allocation, reduce idle resource. Then, we propose a D-based binning algorithm for the batch task scheduling and K-based binning algorithm for the online task scheduling that can adapt to continuously arriving tasks. We conduct extensive experiments on a real Hadoop 2.0 cluster and use two kinds of workloads to evaluate the performance and energy efficiency of our proposal. Compared with Storm ( the streaming data processing tool in Hadoop 2.0) and other approaches including TAPA and DVFS-MR, our proposal is more energy efficient. The batch task scheduling algorithm reduces up to 10 percent of energy consumption and keeps comparable performance. In addition, the online task scheduling algorithm reduces up to 7 percent over the existing algorithms.
机译:Hadoop(尤其是Hadoop 2.0)一直是实时大数据处理的主要框架。但是,Hadoop并未针对能源效率进行优化。为了解决这个问题,本文提出了一个新的框架来提高Hadoop 2.0的能效。我们专注于Hadoop 2.0中的资源管理器,即YARN,并在YARN上提出了节能任务调度机制。特别是,我们专注于CPU密集型流作业,并将流作业分为两种类型,即批处理流作业(即,一组作业同时提交)和在线流作业(即,作业连续一个接一个地提交)。我们为每种流作业设计了不同的节能任务调度算法。特别是,我们首先建议通过考虑任务的特征以及YARN中的计算资源来对性能和能耗进行抽象建模。基于该模型,我们研究了流任务的能效,包括任务的性能模型和能耗模型。我们提出了两个提高能源效率的关键原则:1)了解CPU使用率的任务分配,根据CPU使用率方面的任务特征将任务划分为NM。 2)资源高效的任务分配,减少空闲资源。然后,我们提出了一种基于D的批处理任务调度的分箱算法和一种基于K的在线任务调度的分箱算法,该算法可以适应连续到达的任务。我们在一个真实的Hadoop 2.0集群上进行了广泛的实验,并使用两种工作负载来评估我们建议的性能和能效。与Storm(Hadoop 2.0中的流数据处理工具)和包括TAPA和DVFS-MR在内的其他方法相比,我们的建议更加节能。批处理任务调度算法可减少多达10%的能耗,并保持可比的性能。此外,在线任务调度算法比现有算法减少多达7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号