首页> 外文会议>ACM SIGMOD international conference on management of data >PR-Join: A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees
【24h】

PR-Join: A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees

机译:PR-Join:非阻塞加入实现更高的早期结果率,统计保证

获取原文

摘要

Online aggregation is a promising solution to achieving fast early responses for interactive ad-hoc queries that compute aggregates on a large amount of data. Essential to the success of online aggregation is a good non-blocking join algorithm that enables both (ⅰ) high early result rates with statistical guarantees and (ⅱ) fast end-to-end query times. We analyze existing non-blocking join algorithms and find that they all provide sub-optimal early result rates, and those with fast end-to-end times achieve them only by further sacrificing their early result rates. We propose a new non-blocking join algorithm, Partitioned expanding Ripple Join (PR-Join), which achieves considerably higher early result rates than previous non-blocking joins, while also delivering fast end-to-end query times. PR-Join performs separate, ripple-like join operations on individual hash partitions, where the width of a ripple expands multiplicatively over time. This contrasts with the non-partitioned, fixed-width ripples of Block Ripple Join. Assuming, as in previous non-blocking join studies, that the input relations are in random order, PR-Join ensures representative early results that are amenable to statistical guarantees. We show both analytically and with real-machine experiments that PR-Join achieves over an order of magnitude higher early result rates than previous non-blocking joins. We also discuss the benefits of using a flash-based SSD for temporary storage, showing that PR-Join can then achieve close to optimal end-to-end performance. Finally, we consider the joining of finite data streams that arrive over time, and find that PR-Join achieves similar or higher result rates than RPJ, the state-of-the-art algorithm specialized for that domain.
机译:在线聚合是一个有希望的解决方案,可以实现在大量数据上计算聚集体的交互式Ad-hoc查询的快速早期响应。对在线聚合的成功至关重要是一个良好的非阻塞连接算法,可以实现(Ⅰ)高早期结果率与统计保证和(Ⅱ)快速端到端查询时间。我们分析了现有的非阻塞连接算法,并发现它们都提供了次优的早期结果率,并且具有快速端到端时间的人仅通过进一步牺牲早期结果率来实现它们。我们提出了一种新的非阻塞连接算法,分区扩展纹波连接(PR-Join),其比以前的非阻塞连接的早期结果率相当高,同时还提供快速端到端查询时间。 PR-Join对单独的哈希分区执行单独的纹波状加入操作,其中纹波的宽度随着时间的推移而扩展乘法。这与块纹波连接的未分区,固定宽度纹波形成对比。假设,如前所述的非阻时连接研究,输入关系处于随机顺序,PR-Join确保了统计保证的代表性早期结果。我们在分析上展示,并且具有实际机器实验,该实验,PR-Join可以超过比以前的非阻塞加入更高的早期结果率。我们还讨论了使用基于闪存的SSD进行临时存储的好处,显示PR-Join可以实现接近最佳端到端性能。最后,我们考虑加入随时间到达的有限数据流,并发现PR-Join可以实现与RPJ相似或更高的结果率,专门用于该域的最先进的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号