...
首页> 外文期刊>Algorithmica >A General Method for Estimating Correlated Aggregates Over a Data Stream
【24h】

A General Method for Estimating Correlated Aggregates Over a Data Stream

机译:估算数据流中相关聚合的通用方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

On a stream of two dimensional data items where is an item identifier and is a numerical attribute, a correlated aggregate query asks to first apply a selection predicate along the dimension, followed by an aggregation along the dimension. For selection predicates of the form or , where parameter is provided at query time, we present new streaming algorithms and lower bounds for estimating correlated aggregates. Our main result is a general method that reduces the estimation of a correlated aggregate to the streaming computation of over an entire stream, for an aggregate that satisfies certain conditions. This results in the first sublinear space algorithms for the correlated estimation of a large family of statistics, including frequency moments. Our experimental validation shows that the memory requirements of these algorithms are significantly smaller than existing linear storage solutions, and that these achieve a fast per-record processing time. We also study the setting when items have weights. In the case when weights can be negative, we give a strong space lower bound which holds even if the algorithm is allowed up to a logarithmic number of passes over the data. We complement this with a small space algorithm which uses a logarithmic number of passes.
机译:在一个二维数据项流中,该项是项标识符并且是数字属性,相关的聚合查询要求首先沿维度应用选择谓词,然后沿维度应用聚合。对于形式为或的选择谓词,其中在查询时提供了参数,我们提出了新的流算法和下界来估计相关聚合。我们的主要结果是一种通用方法,对于满足某些条件的聚合,该方法可以将相关聚合的估计减少到整个流的流计算。这导致了第一个亚线性空间算法,用于对包括频率矩在内的大量统计数据进行相关估计。我们的实验验证表明,这些算法的内存要求比现有的线性存储解决方案要小得多,并且这些实现了快速的每条记录处理时间。我们还研究了具有权重的项目的设置。在权重可以为负的情况下,即使给定数据的对数传递次数达到对数,我们也会给出一个强大的空间下界。我们通过使用对数遍数的小空间算法对此进行补充。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号