首页> 外文会议>British national conference on databases >Sampling Estimators for Parallel Online Aggregation
【24h】

Sampling Estimators for Parallel Online Aggregation

机译:并行在线聚合的采样估算器

获取原文

摘要

Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. When coupled with parallel processing, this allows for the interactive data exploration of the largest datasets. In this paper, we identify the main functionality requirements of sampling-based parallel online aggregation-partial aggregation, parallel sampling, and estimation. We argue for overlapped online aggregation as the only scalable solution to combine computation and estimation. We analyze the properties of existent estimators and design a novel sampling-based estimator that is robust to node delay and failure. When executed over a massive 8TB TPC-H instance, the proposed estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders of magnitude smaller than the dataset size and achieves linear scalability.
机译:在线聚合为在实际处理期间提供计算的最终结果。一旦估计足够准确,用户就可以停止计算,通常在执行中。当耦合与并行处理时,这允许对最大数据集进行交互式数据探索。在本文中,我们确定了基于采样的并行在线聚合 - 部分聚合,并行采样和估计的主要功能要求。我们认为重叠的在线聚合作为结合计算和估计的唯一可扩展解决方案。我们分析了存在估计器的属性,并设计了一种基于新的采样的估算器,它是对节点延迟和失败的强大。当在大量的8TB TPC-H实例上执行时,即使最终结果的基数为小于数据集大小的数量级并实现线性可扩展性,所提出的估计器也会在执行中提前提供准确的置信度界限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号