首页> 外文期刊>Concurrency and computation: practice and experience >Novel data-placement scheme for improving the data locality of Hadoop in heterogeneous environments
【24h】

Novel data-placement scheme for improving the data locality of Hadoop in heterogeneous environments

机译:用于改善异构环境中Hadoop数据局部的新型数据放置方案

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

To address the challenging needs of high-performance big data processing, parallel-distributed frameworks such as Hadoop are being utilized extensively. However, in heterogeneous environments, the performance of Hadoop clusters is below par. This is primarily because the blocks of the clusters are allocated equally to all nodes without regard to differences in the capability of individual nodes. This results in reduced data locality. Thus, a new data-placement scheme that enhances data locality is required for Hadoop in heterogeneous environments. This article proposes a new data placement scheme that preserves the same degree of data locality in heterogeneous environments as that of the standard Hadoop, with only a small amount of replicated data. In the proposed scheme, only those blocks with the highest probability of being accessed remotely are selected and replicated. The results of experiments conducted indicate that the proposed scheme incurs only a 20% disk space overhead and has virtually the same data locality ratio as the standard Hadoop, which has a replication factor of three and 200% disk space overhead.
机译:为了解决高性能大数据处理的具有挑战性的需求,广泛使用平行分布式框架,例如Hadoop。然而,在异构环境中,Hadoop集群的性能低于Par。这主要是因为群集的块同样地分配给所有节点,而不考虑各个节点的能力的差异。这导致数据位置减少。因此,在异构环境中Hadoop需要增强数据局部性的新数据放置方案。本文提出了一种新的数据放置方案,它在异构环境中保留了与标准Hadoop的数据位置程度相同程度,只有少量复制数据。在所提出的方案中,选择并复制具有远程访问最高概率的那些块。进行的实验结果表明,所提出的方案仅遭受20%的磁盘空间开销,并且与标准Hadoop具有几乎与标准Hadoop相同的数据占地比,其复制因子为三个和200%的磁盘空间开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号