首页> 外文会议>International Conference on Web-Age Information Management >SALA: A Skew-Avoiding and Locality-Aware Algorithm for MapReduce-Based Join
【24h】

SALA: A Skew-Avoiding and Locality-Aware Algorithm for MapReduce-Based Join

机译:SALA:一种避免基于MapReduce的连接的偏移和局部性感知算法

获取原文

摘要

MapReduce is a parallel programming model, which is extensively used to process join operations for large-scale dataset. However, traditional MapReduce-based join is not efficient when handling skewed data, because it can lead to partitioning skew, which further results in longer response time of the whole join process. Additionally, some newly proposed methods usually involve large amounts of intermediate results over the network in the shuffle phase of Mapreduce-based join, which may consume a lot of time and cause performance degradation. Here a novel algorithm called SALA is proposed, which employs volume/locality-aware partitioning instead of hash partitioning for data distribution. Compared with other existing join algorithms, SALA has three typical advantages: (1) makes sure that the data is distributed to reducers evenly when the input datasets are skewed, (2) reduces the amount of intermediate results transferred across the network by utilizing data locality, and (3) does not make any modification of the MapReduce framework. The extensive experimental results show that SALA not only achieves better load balance but reduces network overhead, and therefore speeds up the whole join process significantly in the presence of data skew.
机译:MapReduce是一个并行编程模型,它广泛用于处理大型数据集的连接操作。但是,处理偏斜数据时,传统的基于MapReduce的连接,因为它可以导致分区偏差,从而进一步导致整个加入过程的较长响应时间。此外,一些新提出的方法通常涉及在基于MapReduce的连接的Shuffle阶段的网络上涉及大量的中间结果,这可能会消耗大量时间并导致性能下降。这里提出了一种名为Sala的新型算法,其采用卷/局部感知的分区而不是用于数据分发的散列分区。与其他现有连接算法相比,Sala具有三种典型优势:(1)确保当输入数据集倾斜时,确保数据被分发为均匀减速,(2)通过利用数据局部减少在网络上传输的中间结果的量,(3)不会对MapReduce框架进行任何修改。广泛的实验结果表明,SALA不仅实现了更好的负载平衡,而且降低了网络开销,因此在数据偏斜的存在下显着加速整个加入过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号