Sampling Estimators for Parallel Online Aggregation

机译：并行在线聚合的采样估算器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. When coupled with parallel processing, this allows for the interactive data exploration of the largest datasets. In this paper, we identify the main functionality requirements of sampling-based parallel online aggregation-partial aggregation, parallel sampling, and estimation. We argue for overlapped online aggregation as the only scalable solution to combine computation and estimation. We analyze the properties of existent estimators and design a novel sampling-based estimator that is robust to node delay and failure. When executed over a massive 8TB TPC-H instance, the proposed estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders of magnitude smaller than the dataset size and achieves linear scalability.

机译：在线聚合为在实际处理期间提供计算的最终结果。一旦估计足够准确，用户就可以停止计算，通常在执行中。当耦合与并行处理时，这允许对最大数据集进行交互式数据探索。在本文中，我们确定了基于采样的并行在线聚合 - 部分聚合，并行采样和估计的主要功能要求。我们认为重叠的在线聚合作为结合计算和估计的唯一可扩展解决方案。我们分析了存在估计器的属性，并设计了一种基于新的采样的估算器，它是对节点延迟和失败的强大。当在大量的8TB TPC-H实例上执行时，即使最终结果的基数为小于数据集大小的数量级并实现线性可扩展性，所提出的估计器也会在执行中提前提供准确的置信度界限。

著录项

来源
《British national conference on databases》|2013年||共14页
会议地点
作者
Chengjie Qin; Florin Rusu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
parallel databases; estimation; sampling; online aggregation;

机译：并行数据库;估计;采样;在线聚合;

相似文献

外文文献
中文文献
专利

1. Adaptive Backstepping Control with Online Parameter Estimator for a Plug-and-Play Parallel Converter System in a Power Switcher [J] . Chujia Guo, Aimin Zhang, Hang Zhang, Energies . 2018,第12期

机译：具有在线参数估计器的自适应Backstepping控制，适用于电源切换器中的即插即用并联转换器系统
2. Edge influence and population aggregation: On point and interval statistical performances of Morisita patchiness index estimators in different sampling schemes [J] . Butturi-Gomes Davi, Petrere Jr Miguel Ecological indicators . 2020,第Jana期

机译：边缘影响和人口聚集：不同采样方案中森田斑驳指数估计量的点和区间统计性能
3. Branch aggregation and crown allometry condition the precision of randomized branch sampling estimators of conifer crown mass. [J] . Schlecht R. M., Affleck D. L. R. Canadian Journal of Forest Research . 2014,第5期

机译：针叶树冠质量的分支聚集和冠状异位测定决定了随机分支采样估计量的精度。
4. Sampling Estimators for Parallel Online Aggregation [C] . Chengjie Qin, Florin Rusu Big data . 2013

机译：并行在线聚合的抽样估计器
5. The Accelerated Cauchy Estimator: A Paradigm for Parallelization [D] . Sanpakit, Chirawat Chriss. 2020

机译：加速的Cauchy估算器：并行化的范式
6. On the convergence rates of kernel estimator and hazard estimator for widely dependent samples [O] . Yongming Li, Yong Zhou, Chao Liu -1

机译：广泛依赖样本的核估计和危害估计的收敛速度
7. PF-OLA: a high-performance framework for parallel online aggregation [O] . Chengjie Qin, Florin Rusu 2013

机译：PF-OLA：并行在线聚合的高性能框架
8. Parallel Smoothed Aggregation Multigrid: Aggregation Strategies on Massively Parallel Machines. [R] . Tuminaro, R. S. 2000

机译：并行平滑聚合多重网格：大规模并行机的聚合策略。

Sampling Estimators for Parallel Online Aggregation

摘要

著录项

相似文献

相关主题

期刊订阅