...
首页> 外文期刊>Peer-to-peer networking and applications >Handling partitioning skew in MapReduce using LEEN - Springer
【24h】

Handling partitioning skew in MapReduce using LEEN - Springer

机译:使用LEEN处理MapReduce中的分区偏斜-Springer

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

MapReduce is emerging as a prominent tool for big data processing. Data locality is a key feature in MapReduce that is extensively leveraged in data-intensive cloud systems: it avoids network saturation when processing large amounts of data by co-allocating computation and data storage, particularly for the map phase. However, our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew (Partitioning skew refers to the case when a variation in either the intermediate keys’ frequencies or their distributions or both among different data nodes) causes a huge amount of data transfer during the shuffle phase and leads to significant unfairness on the reduce input among different data nodes. As a result, the applications severe performance degradation due to the long data transfer during the shuffle phase along with the computation skew, particularly in reduce phase. In this paper, we develop a novel algorithm named LEEN for locality-aware and fairness-aware key partitioning in MapReduce. LEEN embraces an asynchronous map and reduce scheme. All buffered intermediate keys are partitioned according to their frequencies and the fairness of the expected data distribution after the shuffle phase. We have integrated LEEN into Hadoop. Our experiments demonstrate that LEEN can efficiently achieve higher locality and reduce the amount of shuffled data. More importantly, LEEN guarantees fair distribution of the reduce inputs. As a result, LEEN achieves a performance improvement of up to 45 % on different workloads.
机译:MapReduce逐渐成为大数据处理的重要工具。数据局部性是MapReduce中的一项关键功能,在数据密集型云系统中得到了广泛利用:通过共同分配计算和数据存储(尤其是在地图阶段)来处理大量数据时,它可以避免网络饱和。但是,我们对Hadoop(一种广泛使用的MapReduce实现)的研究表明,存在分区偏斜(分区偏斜是指中间键的频率或它们在不同数据节点之间的分布或两者之间的变化引起的情况)。在混洗阶段的大量数据传输会导致不同数据节点之间减少输入的严重不公平现象。结果,由于在混洗阶段期间的长时间数据传输以及计算偏斜(特别是在缩减阶段),导致应用程序性能严重下降。在本文中,我们开发了一种名为LEEN的新颖算法,用于MapReduce中的位置感知和公平感知的密钥分区。 LEEN包含异步映射和归约方案。所有缓冲的中间密钥都将根据其频率和洗牌阶段之后预期数据分布的公平性进行划分。我们已经将LEEN集成到Hadoop中。我们的实验表明LEEN可以有效地实现更高的局部性并减少混洗的数据量。更重要的是,LEEN保证公平分配减少的投入。结果,LEEN在不同的工作负载上实现了高达45%的性能提升。

著录项

  • 来源
    《Peer-to-peer networking and applications》 |2013年第4期|409-424|共16页
  • 作者单位

    1.INRIA Rennes-Bretagne Atlantique Rennes France;

    2.Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology Wuhan China;

    2.Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology Wuhan China;

    3.School of Computer Engineering Nanyang Technological University Singapore Singapore;

    1.INRIA Rennes-Bretagne Atlantique Rennes France;

    2.Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology Wuhan China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    MapReduce Hadoop Cloud computing Skew partitioning Intermediate data;

    机译:MapReduce Hadoop云计算偏斜分区中间数据;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号