首页> 外文期刊>Data mining and knowledge discovery >Identifying correlated heavy-hitters in a two-dimensional data stream
【24h】

Identifying correlated heavy-hitters in a two-dimensional data stream

机译:识别二维数据流中的相关重磅炸弹

获取原文
获取原文并翻译 | 示例
           

摘要

We consider online mining of correlated heavy-hitters (CHH) from a data stream. Given a stream of two-dimensional data, a correlated aggregate query first extracts a substream by applying a predicate along a primary dimension, and then computes an aggregate along a secondary dimension. Prior work on identifying heavy-hitters in streams has almost exclusively focused on identifying heavy-hitters on a single dimensional stream, and these yield little insight into the properties of heavy-hitters along other dimensions. In typical applications however, an analyst is interested not only in identifying heavy-hitters, but also in understanding further properties such as: what other items appear frequently along with a heavy-hitter, or what is the frequency distribution of items that appear along with the heavy-hitters. We consider queries of the following form: "In a stream S of (x, y) tuples, on the substream H of all x values that are heavy-hitters, maintain those y values that occur frequently with the x values in H". We call this problem as CHH. We formulate an approximate formulation of CHH identification, and present an algorithm for tracking CHHs on a data stream. The algorithm is easy to implement and uses workspace much smaller than the stream itself. We present provable guarantees on the maximum error, as well as detailed experimental results that demonstrate the space-accuracy trade-off.
机译:我们考虑从数据流中在线挖掘相关的重击者(CHH)。给定二维数据流,相关的聚合查询首先通过沿主要维度应用谓词来提取子流,然后沿次级维度计算聚合。识别流中的重击者的先前工作几乎完全集中于识别一维流中的重击者,而这些对于沿其他维度的重击者的属性了解甚少。然而,在典型的应用中,分析人员不仅对识别重击者感兴趣,而且对理解其他属性感兴趣,例如:与重击者一起频繁出现的其他项目,或者与重击者一起出现的项目的频率分布是什么?沉重的打击。我们考虑以下形式的查询:“在(x,y)元组的流S中,在所有重击者的x值的子流H上,保持那些x值频繁出现的y值”。我们称此问题为CHH。我们制定了CHH识别的近似公式,并提出了一种在数据流上跟踪CHH的算法。该算法易于实现,并且使用的工作空间比流本身小得多。我们提出了最大误差的可证明保证,以及详细的实验结果,这些结果证明了空间精度的权衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号