Handling partitioning skew in MapReduce using LEEN - Springer

Shadi Ibrahim; Hai Jin; Lu Lu; Bingsheng He; Gabriel Antoniu; Song Wu

首页> 外文期刊>Peer-to-peer networking and applications >Handling partitioning skew in MapReduce using LEEN - Springer

【24h】

Handling partitioning skew in MapReduce using LEEN - Springer

机译：使用LEEN处理MapReduce中的分区偏斜-Springer

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

MapReduce is emerging as a prominent tool for big data processing. Data locality is a key feature in MapReduce that is extensively leveraged in data-intensive cloud systems: it avoids network saturation when processing large amounts of data by co-allocating computation and data storage, particularly for the map phase. However, our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew (Partitioning skew refers to the case when a variation in either the intermediate keys’ frequencies or their distributions or both among different data nodes) causes a huge amount of data transfer during the shuffle phase and leads to significant unfairness on the reduce input among different data nodes. As a result, the applications severe performance degradation due to the long data transfer during the shuffle phase along with the computation skew, particularly in reduce phase. In this paper, we develop a novel algorithm named LEEN for locality-aware and fairness-aware key partitioning in MapReduce. LEEN embraces an asynchronous map and reduce scheme. All buffered intermediate keys are partitioned according to their frequencies and the fairness of the expected data distribution after the shuffle phase. We have integrated LEEN into Hadoop. Our experiments demonstrate that LEEN can efficiently achieve higher locality and reduce the amount of shuffled data. More importantly, LEEN guarantees fair distribution of the reduce inputs. As a result, LEEN achieves a performance improvement of up to 45 % on different workloads.

机译：MapReduce逐渐成为大数据处理的重要工具。数据局部性是MapReduce中的一项关键功能，在数据密集型云系统中得到了广泛利用：通过共同分配计算和数据存储（尤其是在地图阶段）来处理大量数据时，它可以避免网络饱和。但是，我们对Hadoop（一种广泛使用的MapReduce实现）的研究表明，存在分区偏斜（分区偏斜是指中间键的频率或它们在不同数据节点之间的分布或两者之间的变化引起的情况）。在混洗阶段的大量数据传输会导致不同数据节点之间减少输入的严重不公平现象。结果，由于在混洗阶段期间的长时间数据传输以及计算偏斜（特别是在缩减阶段），导致应用程序性能严重下降。在本文中，我们开发了一种名为LEEN的新颖算法，用于MapReduce中的位置感知和公平感知的密钥分区。 LEEN包含异步映射和归约方案。所有缓冲的中间密钥都将根据其频率和洗牌阶段之后预期数据分布的公平性进行划分。我们已经将LEEN集成到Hadoop中。我们的实验表明LEEN可以有效地实现更高的局部性并减少混洗的数据量。更重要的是，LEEN保证公平分配减少的投入。结果，LEEN在不同的工作负载上实现了高达45％的性能提升。

著录项

来源
《Peer-to-peer networking and applications》 |2013年第4期|409-424|共16页
作者
Shadi Ibrahim; Hai Jin; Lu Lu; Bingsheng He; Gabriel Antoniu; Song Wu;
展开▼
作者单位

1.INRIA Rennes-Bretagne Atlantique Rennes France;

2.Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology Wuhan China;

2.Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology Wuhan China;

3.School of Computer Engineering Nanyang Technological University Singapore Singapore;

1.INRIA Rennes-Bretagne Atlantique Rennes France;

2.Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology Wuhan China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
MapReduce Hadoop Cloud computing Skew partitioning Intermediate data;

机译：MapReduce Hadoop云计算偏斜分区中间数据;

相似文献

外文文献
中文文献
专利

1. Handling data skew in joins based on cluster cost partitioning for MapReduce [J] . Wang Yang, Zhong Yong, Ma Qingshan, Multiagent and grid systems . 2018,第1期

机译：基于MapReduce的集群成本划分处理联接中的数据偏斜
2. Handling Data Skew in MapReduce Cluster by Using Partition Tuning [J] . Gao Yufei, Zhou Yanjie, Zhou Bing, Journal of healthcare engineering. . 2017,第1期

机译：使用分区调整处理MapReduce集群中的数据偏斜
3. Handling Data Skew in MapReduce Cluster by Using Partition Tuning [J] . Yufei Gao, Yanjie Zhou, Bing Zhou, Journal of healthcare engineering. . 2017,第1期

机译：使用分区调整处理MapReduce集群中的数据偏差
4. LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud [C] . Ibrahim Shadi, Jin Hai, Lu Lu, 2nd IEEE International Conference on Cloud Computing Technology and Science . 2010

机译：LEEN：云中MapReduce的位置/公平感知关键分区
5. Scaling limits of random skew plane partitions. [D] . Mkrtchyan, Sevak. 2009

机译：随机偏斜平面分区的缩放极限。
6. Handling Data Skew in MapReduce Cluster by Using Partition Tuning [O] . Yufei Gao, Yanjie Zhou, Bing Zhou, 2017

机译：使用分区调整处理MapReduce群集中的数据偏斜
7. Handling Partitioning Skew in MapReduce using LEEN [O] . Ibrahim, Shadi, Jin, Hai, Lu, Lu, 2013

机译：使用LEEN处理MapReduce中的分区偏斜

Handling partitioning skew in MapReduce using LEEN - Springer

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅