Sampling Estimators for Parallel Online Aggregation

机译：并行在线聚合的抽样估计器

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. When coupled with parallel processing, this allows for the interactive data exploration of the largest datasets. In this paper, we identify the main functionality requirements of sampling-based parallel online aggregation-partial aggregation, parallel sampling, and estimation. We argue for overlapped online aggregation as the only scalable solution to combine computation and estimation. We analyze the properties of existent estimators and design a novel sampling-based estimator that is robust to node delay and failure. When executed over a massive 8TB TPC-H instance, the proposed estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders of magnitude smaller than the dataset size and achieves linear scalability.

机译：在线聚合提供了对实际处理过程中最终计算结果的估计。一旦估算值足够准确，用户就可以停止计算，通常是在执行的早期。与并行处理结合使用时，可以对最大的数据集进行交互式数据浏览。在本文中，我们确定了基于采样的并行在线聚集，部分聚集，并行采样和估计的主要功能需求。我们认为重叠的在线聚合是将计算和估计结合起来的唯一可扩展解决方案。我们分析了现有估计量的性质，并设计了一种新颖的基于采样的估计量，该估计量对节点延迟和故障具有鲁棒性。当在大型8TB TPC-H实例上执行时，即使最终结果的基数比数据集大小小七个数量级，并且在线性执行时，建议的估计器仍可以在执行的早期提供准确的置信范围。

著录项

来源
《Big data》|2013年|204-217|共14页
会议地点 Oxford(GB)
作者
Chengjie Qin; Florin Rusu;
展开▼
作者单位

University of California, Merced;

University of California, Merced;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
parallel databases; estimation; sampling; online aggregation;

机译：并行数据库；估计采样;在线汇总;

相似文献

外文文献
中文文献
专利

1. Adaptive Backstepping Control with Online Parameter Estimator for a Plug-and-Play Parallel Converter System in a Power Switcher [J] . Chujia Guo, Aimin Zhang, Hang Zhang, Energies . 2018,第12期

机译：具有在线参数估计器的自适应Backstepping控制，适用于电源切换器中的即插即用并联转换器系统
2. Edge influence and population aggregation: On point and interval statistical performances of Morisita patchiness index estimators in different sampling schemes [J] . Butturi-Gomes Davi, Petrere Jr Miguel Ecological indicators . 2020,第Jana期

机译：边缘影响和人口聚集：不同采样方案中森田斑驳指数估计量的点和区间统计性能
3. Branch aggregation and crown allometry condition the precision of randomized branch sampling estimators of conifer crown mass. [J] . Schlecht R. M., Affleck D. L. R. Canadian Journal of Forest Research . 2014,第5期

机译：针叶树冠质量的分支聚集和冠状异位测定决定了随机分支采样估计量的精度。
4. Sampling Estimators for Parallel Online Aggregation [C] . Chengjie Qin, Florin Rusu British national conference on databases . 2013

机译：并行在线聚合的采样估算器
5. The Accelerated Cauchy Estimator: A Paradigm for Parallelization [D] . Sanpakit, Chirawat Chriss. 2020

机译：加速的Cauchy估算器：并行化的范式
6. On the convergence rates of kernel estimator and hazard estimator for widely dependent samples [O] . Yongming Li, Yong Zhou, Chao Liu -1

机译：广泛依赖样本的核估计和危害估计的收敛速度
7. PF-OLA: a high-performance framework for parallel online aggregation [O] . Chengjie Qin, Florin Rusu 2013

机译：PF-OLA：并行在线聚合的高性能框架
8. Parallel Smoothed Aggregation Multigrid: Aggregation Strategies on Massively Parallel Machines. [R] . Tuminaro, R. S. 2000

机译：并行平滑聚合多重网格：大规模并行机的聚合策略。

Sampling Estimators for Parallel Online Aggregation

摘要

著录项

相似文献

相关主题

期刊订阅