首页> 外文会议> >An effective algorithm for parallelizing sort merge joins in the presence of data skew

【24h】

An effective algorithm for parallelizing sort merge joins in the presence of data skew

机译：存在数据偏斜的并行排序合并联接的有效算法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A parallel sort-merge-join algorithm that uses a divide-and-conquer approach to address the data skew problem is proposed. The algorithm adds an extra scheduling phase to the usual sort, transfer and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the largest skew elements and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution for data skew, the algorithm is shown to achieve very good load balancing for the join phase in a CPU-bound environment and to be very robust relative to the degree of data skew and the total number of processors.

机译：提出了一种采用分而治之的并行排序合并联接算法来解决数据偏斜问题。该算法在通常的排序，传输和连接阶段增加了一个额外的调度阶段。在调度阶段，使用排序阶段的输出的可并行化优化算法尝试在后续的加入阶段中平衡多个处理器之间的负载。该算法自然地识别出最大的偏斜元素，并将每个偏斜元素分配给最佳数量的处理器。假设数据偏斜的分布类似于Zipf，该算法显示出在CPU受限环境中的连接阶段实现了非常好的负载平衡，并且相对于数据偏斜的程度和处理器总数而言非常健壮。

著录项

来源
《》|1990年|P.103-115|共13页
会议地点
作者
Wolf; J.L.; Dias; D.M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. A parallel sort merge join algorithm for managing data skew [J] . Wolf J.L., Dias D.M. IEEE Transactions on Parallel and Distributed Systems . 1993,第1期

机译：用于管理数据偏斜的并行排序合并联接算法
2. New algorithms for parallelizing relational database joins in the presence of data skew [J] . Wolf J.L., Dias D.M. IEEE Transactions on Knowledge and Data Engineering . 1994,第6期

机译：存在数据倾斜时用于并行化关系数据库联接的新算法
3. A parallel hash join algorithm for managing data skew [J] . Wolf J.L., Yu P.S. IEEE Transactions on Parallel and Distributed Systems . 1993,第12期

机译：用于管理数据偏斜的并行哈希联接算法
4. An effective algorithm for parallelizing sort merge joins in the presence of data skew [C] . Joel L. Wolf, Daniel M. Dias, Philip S. Yu Databases in parallel and distributed systems . 1990

机译：存在数据偏斜的并行排序合并联接的有效算法
5. Search and Join Algorithms for Tables in Data Lakes [D] . Zhu, Erkang. 2019

机译：搜索和加入数据湖泊中表的算法
6. Efficient Out of Core Sorting Algorithms for the Parallel Disks Model [O] . Vamsi Kundeti, Sanguthevar Rajasekaran -1

机译：高效输出核心排序算法的并行磁盘模式
7. Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems [O] . Albutiu, Martina-Cezara, Kemper, Alfons, Neumann, Thomas 2012

机译：主内存多核数据库中的大规模并行排序合并连接系统
8. Multiprocessor Sort-Merge Join Algorithm for Relational Data Bases [R] . Thompson, W. C., Ries, D. R. 1981

机译：关系数据库的多处理器排序合并连接算法

An effective algorithm for parallelizing sort merge joins in the presence of data skew

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅