首页> 外文会议>IEEE International Conference on Pervasive Intelligence and Computing >Analysis and Modeling of Resource Management Overhead in Hadoop YARN Clusters
【24h】

Analysis and Modeling of Resource Management Overhead in Hadoop YARN Clusters

机译:Hadoop纱线集群资源管理开销的分析与建模

获取原文

摘要

Hadoop clusters are widely used distributed computing framework for big data processing. Yet Another Resource Negotiator (YARN) was introduced in Hadoop 2.0 and it provides container based resource partitioning and allocation to subdivided units of computation. Hadoop YARN in combination with Hadoop Distributed File System (HDFS) possess almost all the characteristics of a distributed operating system. A container consists of Java virtual machines initiated with dedicated allocation of memory and CPU shares. When jobs are split into small tasks and scheduled to run on containers created dynamically on the nodes of a cluster, the resource management overhead will have significant impact on the execution time of applications. This overhead depends on number of component tasks into which the job gets split. The work presented in this paper evaluates the resource management overhead in Hadoop YARN clusters. The results of this work helps users to select appropriate split level of jobs to minimize the overhead and maximize the performance of distributed applications deployed on the cluster. For evaluating the overhead, MapReduce jobs are run with identical parallelism in execution with varying split-sizes of input file having the same size. The resource manager overhead is estimated from the variation in completion time of the application at different split levels. A regression model is developed to estimate the execution time of jobs on a cluster from the size and split-size of the input file.
机译:Hadoop集群是广泛使用的分布式计算框架进行大数据处理。在Hadoop 2.0中引入了另一个资源谈判者(纱线),它为基于容器的资源分区和分配给细分的计算单元。 Hadoop Yarn与Hadoop分布式文件系统(HDFS)组合具有分布式操作系统的几乎所有特征。一个容器由java虚拟机组成,该虚拟机由专用的内存和CPU共享分配启动。当作业分为小任务并计划在集群节点上动态创建的容器上运行时,资源管理开销将对应用程序的执行时间产生重大影响。此开销取决于作业拆分的组件任务数。本文提出的工作评估了Hadoop纱集群的资源管理开销。这项工作的结果可帮助用户选择适当的拆分级别作业以最小化开销并最大化部署在群集中的分布式应用程序的性能。为了评估开销,MapReduce作业以相同的并行性运行,在执行具有相同大小的不同拆分大小的情况下运行。资源管理器开销估计从应用程序的完井时间的变化处于不同的分割级别。开发了回归模型以估计从输入文件的大小和分割大小的群集中的作业的执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号