首页> 外文会议>Big data >Sampling Estimators for Parallel Online Aggregation
【24h】

Sampling Estimators for Parallel Online Aggregation

机译:并行在线聚合的抽样估计器

获取原文
获取原文并翻译 | 示例

摘要

Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. When coupled with parallel processing, this allows for the interactive data exploration of the largest datasets. In this paper, we identify the main functionality requirements of sampling-based parallel online aggregation-partial aggregation, parallel sampling, and estimation. We argue for overlapped online aggregation as the only scalable solution to combine computation and estimation. We analyze the properties of existent estimators and design a novel sampling-based estimator that is robust to node delay and failure. When executed over a massive 8TB TPC-H instance, the proposed estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders of magnitude smaller than the dataset size and achieves linear scalability.
机译:在线聚合提供了对实际处理过程中最终计算结果的估计。一旦估算值足够准确,用户就可以停止计算,通常是在执行的早期。与并行处理结合使用时,可以对最大的数据集进行交互式数据浏览。在本文中,我们确定了基于采样的并行在线聚集,部分聚集,并行采样和估计的主要功能需求。我们认为重叠的在线聚合是将计算和估计结合起来的唯一可扩展解决方案。我们分析了现有估计量的性质,并设计了一种新颖的基于采样的估计量,该估计量对节点延迟和故障具有鲁棒性。当在大型8TB TPC-H实例上执行时,即使最终结果的基数比数据集大小小七个数量级,并且在线性执行时,建议的估计器仍可以在执行的早期提供准确的置信范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号