SAND Join — A skew handling join algorithm for Google's MapReduce framework

机译：Sand Join - 谷歌的MapReduce框架的Skew处理加入算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The simplicity and flexibility of the MapReduce framework have motivated programmers of large scale distributed data processing applications to develop their applications using this framework. However, the implementations of this framework, including Hadoop, do not handle skew in the input data effectively. Skew in the input data results in poor load balancing which can swamp the benefits achievable by parallelization of applications on such parallel processing frameworks. The performance of join operation, which is the most expensive and most frequently executed operation, is severely degraded in the presence of heavy skew in the input datasets to be joined. Hadoop's implementation of the join operation cannot effectively handle such skewed joins, attributed to the use of hash partitioning for load distribution. In this work, we introduce “Skew hANDling Join” (SAND Join) that employs range partitioning instead of hash partitioning for load distribution. Experiments show that SAND Join algorithm can efficiently perform joins on the datasets that are sufficiently skewed. We also compare the performance of this algorithm with that of Hadoop's join algorithms.

机译：MapReduce框架的简单性和灵活性具有大规模分布式数据处理应用程序的动机程序员，可以使用此框架开发其应用程序。但是，该框架的实现包括Hadoop，不有效地处理输入数据中的偏斜。输入数据中的偏斜导致负载平衡不良，这可以通过在这种并行处理框架上的应用的并行化可实现的益处。连接操作的性能，即最昂贵且最常执行的操作，在输入数据集中的沉重偏斜的情况下严重降低。 Hadoop的连接操作的实现无法有效处理这种偏置的连接，归因于使用散列分区进行负载分布。在这项工作中，我们介绍了使用范围分区而不是用于负载分发的散列分区的“偏斜处理加入”（Sand Join）。实验表明，砂连接算法可以有效地在足够偏斜的数据集上执行连接。我们还将该算法与Hadoop的加入算法的性能进行了比较。

著录项

来源
《IEEE International Multitopic Conference》|2011年||共6页
会议地点
作者
Atta Fariha; Viglas Stratis D.; Niazi Salman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. Nations Join Hands to Stop Dust and Sandstorms [J] . 当代中国人口：英文版 . 2003,第001期
2. Handling data skew in join algorithms using MapReduce [J] . Myung Jaeseok, Shim Junho, Yeon Jongheum, Expert Systems with Application . 2016,第Juna期

机译：使用MapReduce处理联接算法中的数据偏斜
3. SharesSkew: An algorithm to handle skew for joins in MapReduce [J] . Afrati Foto N., Stasinopoulos Nikos, Ullman Jeffrey D., Information Systems . 2018,第SEPa期

机译：SharesSkew：MapReduce中用于处理连接倾斜的算法
4. Load balancing in join algorithms for skewed data in MapReduce systems [J] . Gavagsaz Elaheh, Rezaee Ali, Javadi Hamid Haj Seyyed Journal of supercomputing . 2019,第1期

机译：MapReduce系统中偏斜数据的联接算法中的负载平衡
5. SAND Join — A skew handling join algorithm for Google's MapReduce framework [C] . Atta Fariha, Viglas Stratis D., Niazi Salman Proceedings of the 2011 14th IEEE International Multitopic Conference . 2011

机译：SAND Join — Google MapReduce框架的倾斜处理联接算法
6. ACE: Agile, Contingent and Efficient Similarity Joins Using MapReduce [D] . Lakshminarayanan, Mahalakshmi. 2013

机译：ACE：使用MapReduce的敏捷，偶然和有效相似性联接
7. Handling Data Skew in MapReduce Cluster by Using Partition Tuning [O] . Yufei Gao, Yanjie Zhou, Bing Zhou, 2017

机译：使用分区调整处理MapReduce群集中的数据偏斜
8. SharesSkew: An Algorithm to Handle Skew for Joins in MapReduce [O] . Afrati, Foto, Stasinopoulos, Nikos, Ullman, Jeffrey D., 2015

机译：sharesskew：一种处理mapReduce中连接偏斜的算法

SAND Join — A skew handling join algorithm for Google's MapReduce framework

摘要

著录项

相似文献

相关主题

期刊订阅