Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)

机译：存储桶扩展并行哈希：一种用于超级数据库计算机（SDC）中数据偏斜的新型，健壮，并行哈希联接方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Super Database Computer (SDC) is a high-performance relational database server for a join-intensive environment under development at University of Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other join algorithms, a hash-based algorithm is quite efficient and easily parallelized, and has been employed by many database machines. However, in the presence of data skew, it's hard to distribute load equally among processing modules (PMs) by statically allocating buckets to PMs, as in the conventional parallelizing strategy. Thus, performance is severly degraded.rnIn this paper, we propose a new parallel hash join method, the bucket spreading strategy, which is robust for data skew. During partitioning relations, each bucket is again divided into fragments of the same size and these fragments are temporarily placed on PMs one by one. Then each bucket is dynamically allocated to a PM which actually carries out the join of the bucket, and all fragments of the bucket are collected in the corresponding PM. In this way, the bucket spreading strategy evenly distributes the load among the PMs and parallelism is always fully exploited. The architecture of SDC is designed to support the bucket spreading strategy; a mechanism which distributes the buckets flatly among the PMs is embedded in the hardware of the interconnection network. Simulation results confirm that the bucket spreading strategy is robust for data skew and attains very good scalability.

机译：超级数据库计算机（SDC）是东京大学正在开发的，用于连接密集型环境的高性能关系数据库服务器。 SDC旨在以高度并行的方式执行联接。与其他联接算法相比，基于散列的算法非常有效且易于并行化，并且已被许多数据库机器采用。但是，在数据偏斜的情况下，很难像传统的并行化策略一样，通过将存储桶静态分配给PM来在处理模块（PM）之间平均分配负载。因此，性能严重下降。在本文中，我们提出了一种新的并行散列连接方法，即桶扩展策略，该方法对于数据偏斜具有鲁棒性。在分区关系期间，每个存储桶再次被划分为相同大小的片段，并将这些片段临时逐一放置在PM上。然后，将每个存储桶动态分配给实际执行存储桶加入的PM，并将存储桶中的所有碎片收集在相应的PM中。这样，铲斗扩展策略可以在PM之间平均分配负载，并且始终可以充分利用并行性。 SDC的体系结构旨在支持桶扩展策略；在互连网络的硬件中嵌入了一种在各个PM之间平均分配各个桶的机制。仿真结果证实，桶扩展策略对于数据偏斜是鲁棒的，并且具有很好的可伸缩性。

著录项

来源
《16th international conference on Very Large Data Bases》|1990年|210-221|共12页
会议地点 Brisbane(AU);Brisbane(AU)
作者
Masaru Kitsuregawa; Yasushi Ogawa;
展开▼
作者单位

Institute of Industrial Science, University of Tokyo 7-22-1 Roppongi, Minato-ku, Tokyo 106, Japan;

Research and Development Center, RICOH Co., Ltd. 16-1 Shinei-cho, Kohoku-ku, Yokohama 223, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类各种专用数据库;
关键词

相似文献

外文文献
中文文献
专利

1. Parallel execution of hash joins in parallel databases [J] . Hui-I Hsiao, Ming-Syan Chen IEEE Transactions on Parallel and Distributed Systems . 1997,第8期

机译：在并行数据库中并行执行哈希联接
2. A parallel hash join algorithm for managing data skew [J] . Wolf J.L., Yu P.S. IEEE Transactions on Parallel and Distributed Systems . 1993,第12期

机译：用于管理数据偏斜的并行哈希联接算法
3. Parallel set similarity join on big data based on Locality-Sensitive Hashing [J] . Mohammad Karim Sohrabi, Hosseion Azgomi Science of Computer Programming . 2017,第octa1期

机译：基于局部敏感哈希的大数据并行集合相似性联接
4. Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) [C] . International conference on Very Large Data Bases . 1990

机译：铲斗扩散平行散列：超级数据库计算机（SDC）中的数据偏差的新的，坚固，并行散列连接方法（SDC）
5. Perceptually based methods for robust image hashing. [D] . Monga, Vishal. 2005

机译：基于鲁棒图像哈希的基于感知的方法。
6. MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data [O] . Jingjing Wang, Chen Lin 2015

机译：基于MapReduce的个性化本地敏感哈希用于大规模数据上的相似联接
7. Parallel Execution Of Hash Joins In Parallel Databases [O] . Hui-i Hsiao, Ming-syan Chen, Philip S. Yu 1997

机译：并行数据库中并行执行哈希联接
8. Work Efficient Hashing on Parallel and Vector Computers [R] . Sheffler, T. J., Bryant, R. E. 1992

机译：在并行和向量计算机上进行高效的哈希处理

Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)

摘要

著录项

相似文献

相关主题

期刊订阅