【24h】

Finding Heavy Distinct Hitters in Data Streams

机译:在数据流中发现严重的问题

获取原文
获取原文并翻译 | 示例

摘要

A simple indicator for an anomaly in a network is a rapid increase in the total number of distinct network connections. While it is fairly easy to maintain an accurate estimate of the current total number of distinct connections using streaming algorithms that exhibit both a low space and computational complexity, identifying the network entities that are involved in the largest number of distinct: connections efficiently is considerably harder. In this paper, we study the problem of finding all entities whose number of distinct (outgoing or incoming) network connections is at least a specific fraction of the total number of distinct connections. These entities are referred to as heavy distinct hitters. Since this problem is hard in general, we focus on randomized approximation techniques and propose a sampling-based and a sketch-based streaming algorithm. Both algorithms output a list of the potential heavy distinct hitters including the estimated counts of the corresponding number of distinct connections. We prove that, depending on the required level of accuracy of the output list, the space complexities of the presented algorithms are asymptotically optimal up to small logarithmic factors. Additionally, the algorithms are evaluated and compared using real network data in order to determine their usefulness in practice.
机译:网络异常的一个简单指标是不同网络连接总数的迅速增加。使用显示空间少和计算复杂度高的流式算法很容易维护当前不同连接的总数的准确估计,但要确定涉及最大数量的不同连接的网络实体:有效地进行连接要困难得多。在本文中,我们研究发现以下问题:找到所有具有不同(发送或传入)网络连接数至少是不同连接总数的特定分数的实体。这些实体被称为重磅不同的击球手。由于这个问题通常很难解决,因此我们将重点放在随机逼近技术上,并提出一种基于采样和基于草图的流算法。两种算法都输出潜在的重磅不同击球手的列表,其中包括对应数量的不同连接的估计计数。我们证明,根据输出列表的准确性要求,所提出算法的空间复杂度在对数较小的情况下是渐近最优的。另外,使用实际网络数据对算法进行评估和比较,以确定其在实践中的实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号