首页> 外文会议>16th international conference on Very Large Data Bases >Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)
【24h】

Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)

机译:存储桶扩展并行哈希:一种用于超级数据库计算机(SDC)中数据偏斜的新型,健壮,并行哈希联接方法

获取原文
获取原文并翻译 | 示例

摘要

The Super Database Computer (SDC) is a high-performance relational database server for a join-intensive environment under development at University of Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other join algorithms, a hash-based algorithm is quite efficient and easily parallelized, and has been employed by many database machines. However, in the presence of data skew, it's hard to distribute load equally among processing modules (PMs) by statically allocating buckets to PMs, as in the conventional parallelizing strategy. Thus, performance is severly degraded.rnIn this paper, we propose a new parallel hash join method, the bucket spreading strategy, which is robust for data skew. During partitioning relations, each bucket is again divided into fragments of the same size and these fragments are temporarily placed on PMs one by one. Then each bucket is dynamically allocated to a PM which actually carries out the join of the bucket, and all fragments of the bucket are collected in the corresponding PM. In this way, the bucket spreading strategy evenly distributes the load among the PMs and parallelism is always fully exploited. The architecture of SDC is designed to support the bucket spreading strategy; a mechanism which distributes the buckets flatly among the PMs is embedded in the hardware of the interconnection network. Simulation results confirm that the bucket spreading strategy is robust for data skew and attains very good scalability.
机译:超级数据库计算机(SDC)是东京大学正在开发的,用于连接密集型环境的高性能关系数据库服务器。 SDC旨在以高度并行的方式执行联接。与其他联接算法相比,基于散列的算法非常有效且易于并行化,并且已被许多数据库机器采用。但是,在数据偏斜的情况下,很难像传统的并行化策略一样,通过将存储桶静态分配给PM来在处理模块(PM)之间平均分配负载。因此,性能严重下降。在本文中,我们提出了一种新的并行散列连接方法,即桶扩展策略,该方法对于数据偏斜具有鲁棒性。在分区关系期间,每个存储桶再次被划分为相同大小的片段,并将这些片段临时逐一放置在PM上。然后,将每个存储桶动态分配给实际执行存储桶加入的PM,并将存储桶中的所有碎片收集在相应的PM中。这样,铲斗扩展策略可以在PM之间平均分配负载,并且始终可以充分利用并行性。 SDC的体系结构旨在支持桶扩展策略;在互连网络的硬件中嵌入了一种在各个PM之间平均分配各个桶的机制。仿真结果证实,桶扩展策略对于数据偏斜是鲁棒的,并且具有很好的可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号