首页> 外文会议>Proceedings of the 5th ACM international conference on distributed event-based systems. >Space-efficient Tracking of Persistent Items in a Massive Data Stream
【24h】

Space-efficient Tracking of Persistent Items in a Massive Data Stream

机译:海量数据流中持久项的空间高效跟踪

获取原文
获取原文并翻译 | 示例

摘要

Motivated by scenarios in network anomaly detection, we consider the problem of detecting persistent items in a data stream, which are items that occur "regularly" in the stream. In contrast with heavy-hitters, persistent items do not necessarily contribute significantly to the volume of a stream, and may escape detection by traditional volume-based anomaly detectors. We first show that any online algorithm that tracks persistent items exactly must necessarily use a large workspace, and is infeasible to run on a traffic monitoring node. In light of this lower bound, we introduce an approximate formulation of the problem and present a small-space algorithm to approximately track persistent items over a large data stream. Our experiments on a real traffic dataset shows that in typical cases, the algorithm achieves a physical space compression of 5x-7x, while incurring very few false positives (< 1%) and false negatives (< 4%). To our knowledge, this is the first systematic study of the problem of detecting persistent items in a data stream, and our work can help detect anomalies that are temporal, rather than volume based.
机译:受网络异常检测中的场景的影响,我们考虑检测数据流中的持久项的问题,这些持久项是在流中“定期”出现的项。与重击者相比,持久性物品不一定对流的体积有显着贡献,并且可能会避开传统基于体积的异常检测器的检测。我们首先显示,任何精确跟踪持久性项目的在线算法都必须使用较大的工作空间,并且无法在流量监控节点上运行。鉴于此下限,我们引入了问题的近似公式,并提出了一种小空间算法来近似跟踪大型数据流上的持久项。我们在真实流量数据集上的实验表明,在典型情况下,该算法可实现5x-7x的物理空间压缩,而产生的误报率极低(<1%),而误报率极低(<4%)。据我们所知,这是对检测数据流中的持久项问题的首次系统研究,我们的工作可以帮助检测基于时间而不是基于量的异常。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号