首页> 外文会议>IEEE International Multitopic Conference >SAND Join — A skew handling join algorithm for Google's MapReduce framework
【24h】

SAND Join — A skew handling join algorithm for Google's MapReduce framework

机译:Sand Join - 谷歌的MapReduce框架的Skew处理加入算法

获取原文

摘要

The simplicity and flexibility of the MapReduce framework have motivated programmers of large scale distributed data processing applications to develop their applications using this framework. However, the implementations of this framework, including Hadoop, do not handle skew in the input data effectively. Skew in the input data results in poor load balancing which can swamp the benefits achievable by parallelization of applications on such parallel processing frameworks. The performance of join operation, which is the most expensive and most frequently executed operation, is severely degraded in the presence of heavy skew in the input datasets to be joined. Hadoop's implementation of the join operation cannot effectively handle such skewed joins, attributed to the use of hash partitioning for load distribution. In this work, we introduce “Skew hANDling Join” (SAND Join) that employs range partitioning instead of hash partitioning for load distribution. Experiments show that SAND Join algorithm can efficiently perform joins on the datasets that are sufficiently skewed. We also compare the performance of this algorithm with that of Hadoop's join algorithms.
机译:MapReduce框架的简单性和灵活性具有大规模分布式数据处理应用程序的动机程序员,可以使用此框架开发其应用程序。但是,该框架的实现包括Hadoop,不有效地处理输入数据中的偏斜。输入数据中的偏斜导致负载平衡不良,这可以通过在这种并行处理框架上的应用的并行化可实现的益处。连接操作的性能,即最昂贵且最常执行的操作,在输入数据集中的沉重偏斜的情况下严重降低。 Hadoop的连接操作的实现无法有效处理这种偏置的连接,归因于使用散列分区进行负载分布。在这项工作中,我们介绍了使用范围分区而不是用于负载分发的散列分区的“偏斜处理加入”(Sand Join)。实验表明,砂连接算法可以有效地在足够偏斜的数据集上执行连接。我们还将该算法与Hadoop的加入算法的性能进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号