【24h】

Analysis and Modeling of Resource Management Overhead in Hadoop YARN Clusters

机译:Hadoop YARN集群中的资源管理开销分析和建模

获取原文
获取原文并翻译 | 示例

摘要

Hadoop clusters are widely used distributed computing framework for big data processing. Yet Another Resource Negotiator (YARN) was introduced in Hadoop 2.0 and it provides container based resource partitioning and allocation to subdivided units of computation. Hadoop YARN in combination with Hadoop Distributed File System (HDFS) possess almost all the characteristics of a distributed operating system. A container consists of Java virtual machines initiated with dedicated allocation of memory and CPU shares. When jobs are split into small tasks and scheduled to run on containers created dynamically on the nodes of a cluster, the resource management overhead will have significant impact on the execution time of applications. This overhead depends on number of component tasks into which the job gets split. The work presented in this paper evaluates the resource management overhead in Hadoop YARN clusters. The results of this work helps users to select appropriate split level of jobs to minimize the overhead and maximize the performance of distributed applications deployed on the cluster. For evaluating the overhead, MapReduce jobs are run with identical parallelism in execution with varying split-sizes of input file having the same size. The resource manager overhead is estimated from the variation in completion time of the application at different split levels. A regression model is developed to estimate the execution time of jobs on a cluster from the size and split-size of the input file.
机译:Hadoop集群被广泛用于大数据处理的分布式计算框架。 Hadoop 2.0中引入了另一个资源协商器(YARN),它提供了基于容器的资源分区和分配给细分的计算单元。 Hadoop YARN与Hadoop分布式文件系统(HDFS)相结合,几乎具有分布式操作系统的所有特征。容器由Java虚拟机组成,这些Java虚拟机通过专用的内存和CPU份额分配启动。将作业拆分为小任务并计划在群集节点上动态创建的容器上运行时,资源管理开销将对应用程序的执行时间产生重大影响。该开销取决于将作业拆分为多个组件任务的数量。本文介绍的工作评估了Hadoop YARN集群中的资源管理开销。这项工作的结果可帮助用户选择适当的作业拆分级别,以最大程度地减少开销并最大化在群集上部署的分布式应用程序的性能。为了评估开销,MapReduce作业在执行时以相同的并行度运行,但输入文件的拆分大小各不相同,但拆分大小相同。根据不同拆分级别上应用程序完成时间的变化来估算资源管理器的开销。开发了回归模型,以根据输入文件的大小和拆分大小来估计集群上作业的执行时间。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号