首页> 外文期刊>Future Internet >On Frequency Estimation and Detection of Heavy Hitters in Data Streams
【24h】

On Frequency Estimation and Detection of Heavy Hitters in Data Streams

机译:关于数据流中频率估算与检测

获取原文
       

摘要

A stream can be thought of as a very large set of data, sometimes even infinite, which arrives sequentially and must be processed without the possibility of being stored. In fact, the memory available to the algorithm is limited and it is not possible to store the whole stream of data which is instead scanned upon arrival and summarized through a succinct data structure in order to maintain only the information of interest. Two of the main tasks related to data stream processing are frequency estimation and heavy hitter detection. The frequency estimation problem requires estimating the frequency of each item, that is the number of times or the weight with which each appears in the stream, while heavy hitter detection means the detection of all those items with a frequency higher than a fixed threshold. In this work we design and analyze ACMSS, an algorithm for frequency estimation and heavy hitter detection, and compare it against the state of the art AS ketch algorithm. We show that, given the same budgeted amount of memory, for the task of frequency estimation our algorithm outperforms AS ketch with regard to accuracy. Furthermore, we show that, under the assumptions stated by its authors, AS ketch may not be able to report all of the heavy hitters whilst ACMSS will provide with high probability the full list of heavy hitters.
机译:流可以被认为是一组非常大的数据,有时甚至是无限的,该数据依次到达,并且必须在没有存储的可能性的情况下处理。实际上,算法可用的存储器是有限的,并且不可能存储在到达时扫描的整个数据流,并通过简洁的数据结构总结,以便仅维持感兴趣的信息。与数据流处理相关的两个主要任务是频率估计和沉重的击球率检测。频率估计问题需要估计每个项目的频率,即每个项目的频率或每个次数出现在流中的权重,而重的击球率检测意味着检测频率高于固定阈值的所有这些项目。在这项工作中,我们设计和分析ACMS,频率估计和重击中检测算法,并将其与ketch算法的最新状态进行比较。我们展示了,给定相同的预算存储量,对于频率估计的任务,我们的算法在准确率方面优于Ketch。此外,我们表明,在其作者所规定的假设下,由于Ketch可能无法报告所有沉重的击球手,而ACMSS将提供高概率的全部重型列表。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号