首页> 外文期刊>Journal of ambient intelligence and humanized computing >Historical data based approach for straggler avoidance in a heterogeneous Hadoop cluster
【24h】

Historical data based approach for straggler avoidance in a heterogeneous Hadoop cluster

机译:基于历史数据在异构Hadoop集群中逃避的职务方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Cloud computing has emerged as a new way of sharing resources. MapReduce has become the de facto standard for cloud computing, which helps for data-intensive computation in parallel. Hadoop is an open-source framework that allows the implementation of MapReduce on the cluster of commodity hardware. An environment with different generations of commodity hardware (node) raises heterogeneity in the Hadoop environment. Today heterogeneity has become common in industries as well as in research centers. Hadoop's current implementation assumes that nodes in the environment are homogeneous and distribute the workload evenly among these nodes. This homogeneity assumption creates a load imbalance among the nodes in the heterogeneous Hadoop environment, which furthers leads to stragglers. Stragglers are the nodes that are available in the environment, but their performance is abysmal. The paper proposed a Historical data based data placement (HDBDP) policy to balance the workload among heterogeneous nodes based on their computing capabilities to improve the Map tasks data locality and to reduce the job turnaround time in the heterogeneous Hadoop environment. The approach introduces an agent to measures the node computing capabilities using the job history information. It also helps NameNode to decide the block counts for each node in the environment. The proposed policy's performance is evaluated on Hadoop's most popular benchmark, i.e., HiBench benchmark suite. Finally, compared to the Hadoop's default data placement policy and different policies, the proposed HDBDP policy minimizes the job turnaround time for several workloads by an average of 14-26%. Also, it improves the Map tasks data locality by nearly 27% in a heterogeneous Hadoop environment.
机译:云计算已成为共享资源的新方式。 MapReduce已成为云计算的事实标准,这有助于并行数据密集型计算。 Hadoop是一个开源框架,允许在商品硬件集群上实现MapReduce。具有不同几代商品硬件(节点)的环境引发了Hadoop环境中的异质性。今天,异质性在行业以及研究中心中变得普遍。 Hadoop的当前实施假设环境中的节点是同质的,并在这些节点之间均匀地分配工作负载。这种均匀性假设在异构Hadoop环境中的节点中产生了负载不平衡,该环境中的节点是传统的导向陷阱。 Stragglers是环境中可用的节点,但它们的性能是Abysmal。本文提出了一种基于历史数据的数据放置(HDBDP)策略,基于其计算能力来平衡异构节点之间的工作量,以改进地图任务数据局部,并减少异构Hadoop环境中的作业周转时间。该方法使用作业历史信息介绍一种代理来测量节点计算能力。它还有助于NameNode决定环境中每个节点的块计数。拟议的政策的表现是在Hadoop最受欢迎的基准,即Hibench基准套件上进行评估。最后,与Hadoop的默认数据放置策略和不同的策略相比,所提出的HDBDP策略最小化了几个工作负载的工作周转时间,平均为14-26%。此外,它在异构Hadoop环境中通过近27%改善了地图任务数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号