首页> 外文期刊>Neurocomputing >CVS: Fast cardinality estimation for large-scale data streams over sliding windows
【24h】

CVS: Fast cardinality estimation for large-scale data streams over sliding windows

机译:CVS:滑动窗口上大规模数据流的快速基数估计

获取原文
获取原文并翻译 | 示例

摘要

Estimating the cardinality of data streams over a sliding window is an important problem in many applications, such as network traffic monitoring, web access log analysis and database. The problem becomes more difficult in large-scale data streams when time and space complexity is taken into account. In this paper, we present a novel randomized data structure to address the problem. The significant contributions are as follows: (1) A space-efficient counter vector sketch (CVS) are proposed, which extends the well-known bitmap sketch to sliding window settings. (2) Based on the CVS, a random update mechanism is introduced, whereby a small fixed number of entries are randomly chosen from CVS in a step and then updated. This means that the update procedure just costs constant time. (3) Furthermore, estimating cardinality by CVS just needs one-pass scan of the data. (4) Finally, a theoretical analysis is given to show the accuracy of CVS-based estimators. Our comprehensive experiments confirm that the CVS-based schema attains high accuracy, and demonstate its time efficiency in comparison with the Timestamp Vector (TSV) and the auxiliary indexing method. (C) 2016 Elsevier B.V. All rights reserved.
机译:估计滑动窗口上数据流的基数是许多应用程序中的重要问题,例如网络流量监控,Web访问日志分析和数据库。考虑到时间和空间的复杂性,在大规模数据流中问题变得更加困难。在本文中,我们提出了一种新颖的随机数据结构来解决该问题。主要的贡献如下:(1)提出了一种节省空间的计数器矢量草图(CVS),它将众所周知的位图草图扩展到滑动窗口设置。 (2)基于CVS,引入了随机更新机制,由此从一步中从CVS中随机选择少量固定条目,然后进行更新。这意味着更新过程仅花费恒定的时间。 (3)此外,通过CVS估计基数只需要对数据进行一次扫描。 (4)最后,进行了理论分析以表明基于CVS的估计量的准确性。我们的综合实验证实,基于CVS的方案具有较高的准确性,并与时间戳矢量(TSV)和辅助索引方法相比,证明了其时间效率。 (C)2016 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2016年第19期|107-116|共10页
  • 作者单位

    PLA Univ Sci & Technol, Coll Command Informat Syst, Nanjing, Jiangsu, Peoples R China|Huaiyin Inst Technol, Fac Comp Engn, Huaian, Peoples R China;

    PLA Univ Sci & Technol, Coll Command Informat Syst, Nanjing, Jiangsu, Peoples R China;

    PLA Univ Sci & Technol, Coll Command Informat Syst, Nanjing, Jiangsu, Peoples R China;

    PLA Univ Sci & Technol, Coll Command Informat Syst, Nanjing, Jiangsu, Peoples R China;

    PLA Univ Sci & Technol, Coll Command Informat Syst, Nanjing, Jiangsu, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Data stream; Sliding window; Counter vector sketch; Cardinality; Random update;

    机译:数据流;滑动窗口;计数器矢量草图;基数;随机更新;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号