Flow-Join: Adaptive skew handling for distributed joins over high-speed networks

机译：Flow-Join：自适应偏斜处理，用于高速网络上的分布式联接

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern InfiniBand interconnects offer link speeds of several gigabytes per second and a remote direct memory access (RDMA) paradigm for zero-copy network communication. Both are crucial for parallel database systems to achieve scalable distributed query processing where adding a server to the cluster increases performance. However, the scalability of distributed joins is threatened by unexpected data characteristics: Skew can cause a severe load imbalance such that a single server has to process a much larger part of the input than its fair share and by this slows down the entire distributed query. We introduce Flow-Join, a novel distributed join algorithm that handles attribute value skew with minimal overhead. Flow-Join detects heavy hitters at runtime using small approximate histograms and adapts the redistribution scheme to resolve load imbalances before they impact the join performance. Previous approaches often involve expensive analysis phases, which slow down distributed join processing for non-skewed workloads. This is especially the case for modern high-speed interconnects, which are too fast to hide the extra computation. Other skew handling approaches require detailed statistics, which are often not available or overly inaccurate for intermediate results. In contrast, Flow-Join uses our novel lightweight skew handling scheme to execute at the full network speed of more than 6 GB/s for InfiniBand 4¿¿FDR, joining a skewed input at 11.5 billion tuples/s with 32 servers. This is 6.8¿¿ faster than a standard distributed hash join using the same hardware. At the same time, Flow-Join does not compromise the join performance for non-skewed workloads.

机译：现代的InfiniBand互连提供每秒几GB的链接速度，以及用于零拷贝网络通信的远程直接内存访问（RDMA）范例。两者对于并行数据库系统实现可伸缩的分布式查询处理都是至关重要的，在此过程中，将服务器添加到群集可提高性能。但是，分布式联接的可伸缩性受到意外数据特性的威胁：偏移会导致严重的负载不平衡，以至于一台服务器必须处理比其合理份额更大的一部分输入，从而减慢了整个分布式查询的速度。我们介绍了Flow-Join，这是一种新颖的分布式联接算法，它以最小的开销处理属性值偏斜。 Flow-Join在运行时使用小的近似直方图检测重击手，并在影响连接性能之前调整重分配方案以解决负载不平衡问题。先前的方法通常涉及昂贵的分析阶段，这会减慢针对非倾斜工作负载的分布式联接处理。对于现代高速互连而言尤其如此，因为它们太快了以至于无法隐藏额外的计算。其他偏斜处理方法需要详细的统计信息，对于中间结果，这些统计信息通常不可用或过于不准确。相比之下，Flow-Join使用我们新颖的轻量级偏移处理方案为InfiniBand 4 ?? FDR以超过6 GB / s的全网速执行，并以32台服务器以115亿个元组/ s的速率连接输入。这比使用相同硬件的标准分布式哈希连接快6.8个。同时，Flow-Join不会影响非偏斜工作负载的联接性能。

著录项

来源
《IEEE International Conference on Data Engineering》|2016年|1194-1205|共12页
会议地点
作者
Wolf Rödiger; Sam Idicula; Alfons Kemper; Thomas Neumann;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Distributed radio access technology selection for adaptive networks in high-speed, B3G infrastructures [J] . K. Tsagkaris, G. Dimitrakopoulos, A. Saatsakis, International journal of communication systems . 2007,第8期

机译：高速B3G基础设施中的自适应网络的分布式无线电接入技术选择
2. Robust incremental adaptive strategies for distributed networks to handle outliers in both input and desired data [J] . Upendra Kumar Sahoo, Ganapati Panda, Bernard Mulgrew, Signal processing . 2014,第ptaB期

机译：分布式网络的鲁棒增量自适应策略，可以处理输入数据和所需数据中的异常值
3. A Distributed Mechanism for Handling of Adaptive/Intelligent Selfish Misbehaviour at MAC Layer in Mobile Ad Hoc Networks [J] . Raja Gunasekaran, Vaidheyanathan Rhymend Uthariaraj, Uamapathy Yamini, 计算机科学技术学报（英文版） . 2009,第003期

机译：移动Ad Hoc网络MAC层的自适应/智能自私行为处理的分布式机制。
4. Flow-Join: Adaptive skew handling for distributed joins over high-speed networks [C] . Wolf R?diger, Sam Idicula, Alfons Kemper, IEEE International Conference on Data Engineering . 2016

机译：流程连接：用于高速网络的分布式加入的自适应偏斜处理
5. Efficient range and join query processing in massively distributed peer-to-peer networks. [D] . Wang, Qiang. 2008

机译：大规模分布的对等网络中的有效范围和联接查询处理。
6. Networked buffering: a basic mechanism for distributed robustness in complex adaptive systems [O] . James M Whitacre, Axel Bender 2010

机译：网络缓冲：复杂自适应系统中分布式鲁棒性的基本机制
7. Efficiently Handling Skew in Outer Joins on Distributed Systems [O] . 2016

机译：有效处理分布式系统上外连接的偏差

Flow-Join: Adaptive skew handling for distributed joins over high-speed networks

摘要

著录项

相似文献

相关主题

期刊订阅