首页> 外文期刊>World Wide Web >Novel structures for counting frequent items in time decayed streams
【24h】

Novel structures for counting frequent items in time decayed streams

机译:新颖的结构,用于统计时间衰减流中的频繁项目

获取原文
获取原文并翻译 | 示例
           

摘要

Identifying frequently occurring items is a fundamental building block in many data stream applications. A great deal of work for efficiently identifying frequent items has been studied on the landmark and sliding window models. In this work, we revisit this problem on a new streaming model based on the time decay, where the importance of every arrival item is decreased over the time. To address the importance changes over time, we propose an innovative heap structure, named Quasi-heap, which maintains the item order using a lazy update mechanism. Two approximation algorithm, Space Saving with Quasi-heap (SSQ) and Filtered Space Saving with Quasi-heap (FSSQ), are proposed to find the frequently occurring items based on the Quasi-heap structure. To achieve better accuracy of frequency estimation for all the items in the stream, we introduce a new count-min-min (CMM) sketch structure, which can estimate the count of an item with almost error free. Extensive experiments conducted on both real-world and synthetic data demonstrate the superiority of proposed methods in terms of both efficiency (i.e., response time) and effectiveness (i.e., accuracy).
机译:识别频繁出现的项目是许多数据流应用程序中的基本构建块。在地标和滑动窗口模型上,已经进行了大量工作以有效地识别频繁项。在这项工作中,我们将基于时间衰减在新的流模型上重新审视此问题,其中每个到达项的重要性都会随着时间的推移而降低。为了解决随时间变化的重要性,我们提出了一种创新的堆结构,称为准堆,该堆结构使用惰性更新机制来维护项目顺序。提出了两种近似算法,即准堆节省空间(SSQ)和滤波后的准堆节省空间(FSSQ),以基于准堆结构查找频繁出现的项目。为了实现对流中所有项目的频率估计的更好准确性,我们引入了一种新的最小计数最小(CMM)草图结构,该结构可以几乎无错误地估计一个项目的计数。在真实和合成数据上进行的大量实验证明了所提方法在效率(即响应时间)和有效性(即准确性)方面的优越性。

著录项

  • 来源
    《World Wide Web》 |2017年第5期|1111-1133|共23页
  • 作者单位

    Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China;

    Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China;

    Univ Macau, Fac Sci & Technol, Macau, Peoples R China;

    Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China;

    Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Frequent item; Data stream; Time decay model; Data structure;

    机译:频繁项;数据流;时间衰减模型;数据结构;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号