...
首页> 外文期刊>World Wide Web >Fine-grained probability counting for cardinality estimation of data streams
【24h】

Fine-grained probability counting for cardinality estimation of data streams

机译:用于数据流基数估计的细粒度概率计数

获取原文
获取原文并翻译 | 示例

摘要

Estimating the number of distinct flows, also called the cardinality, is an important issue in many network applications, such as traffic measurement, anomaly detection, etc. The challenge is that high accuracy should be achieved with line speed and small auxiliary memory. Flajolet-Martin algorithm, LogLog algorithm, and HyperLogLog algorithm form a line of work in this area with improving performance. In this paper, we propose refined versions of these algorithms to achieve higher accuracy. The key observations are (1) the leftmost hash functions used by these algorithms can be generalized to reach higher accuracy, (2) the amendment coefficient can be highly biased in some certain streams or datasets so dynamically setting the amendment coefficient instead of using the one derived in pure math can lead to much better accuracy. Experimental results show great improvement of accuracy and stability of the refined versions over original algorithms.
机译:估计不同流的数量(也称为基数)是许多网络应用程序(例如流量测量,异常检测等)中的重要问题。挑战在于,应以线速度和较小的辅助存储器来实现高精度。 Flajolet-Martin算法,LogLog算法和HyperLogLog算法构成了该领域的工作线,并提高了性能。在本文中,我们提出了这些算法的改进版本以实现更高的精度。关键的观察结果是(1)这些算法使用的最左侧的哈希函数可以被广义化以达到更高的精度;(2)修正系数在某些特定流或数据集中可能存在高度偏差,因此可以动态设置修正系数而不是使用修正系数用纯数学推导得出的精度更高。实验结果表明,与原始算法相比,改进版本的准确性和稳定性有了很大提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号