首页> 外文会议>IEEE International Conference on Big Data >Setting the threshold for high throughput detectors: A mathematical approach for ensembles of dynamic, heterogeneous, probabilistic anomaly detectors
【24h】

Setting the threshold for high throughput detectors: A mathematical approach for ensembles of dynamic, heterogeneous, probabilistic anomaly detectors

机译:设置高吞吐量探测器的阈值:动态,异构,概率异常探测器集合的数学方法

获取原文

摘要

Cyber operations now manage a high volume of heterogeneous log data. Anomaly Detection (AD) in such operations involves multiple (e.g., per IP, per data type) ensembles of detectors modeling heterogeneous characteristics (e.g., rate, size, type) often with adaptive online models producing alerts in near real time. Because of the high data volume, setting the threshold for each detector in such a system is an essential yet underdeveloped configuration issue that, if slightly mistuned, can leave the system useless, either producing a myriad of alerts (and flooding downstream systems) or giving none. In this work, we build on the foundations of Ferragut et al. to provide a set of rigorous results for understanding the relationship between threshold values and alert quantities for probabilistic detectors. This informs an algorithm for setting the threshold of multiple, heterogeneous, possibly dynamic detectors completely a priori, in principle. Indeed, if the underlying distribution of the incoming data is known, the algorithm provides provably manageable thresholds. If the distribution is unknown (poorly estimated), our analysis gives insight into how the model distribution differs from the actual distribution, indicating refitting is necessary. We provide empirical experiments, regulating the alert rate of a system with ≈2,500 adaptive detectors scoring over 1.5M events in 5 hours of timestamps. Further, we demonstrate on real network data and detection framework of Harshaw et al. the alternative case, demonstrating that the inability to regulate alerts indicates how the detection model is not a good fit to the data.
机译:网络操作现在管理大量的异构日志数据。在这种操作中的异常检测(AD)涉及探测器的多个(例如,每个数据类型)的探测器,其模拟异构特征(例如,速率,尺寸,类型),通常在近实时产生警报的自适应在线模型。由于高数据量,在这样的系统中为每个检测器设置阈值是必不可少的尚未开发的配置问题,如果稍微误,可以将系统留下无用,可以生成无数的警报(以及泛滥下游系统)或给予没有任何。在这项工作中,我们建立在Ferragut等人的基础上。提供一组严格的结果,了解阈值与概率检测器的警报数量之间的关系。这通知了算法以完全先验的是设置多个异构,可能的动态探测器的阈值。实际上,如果已知进入数据的底层分布,则该算法提供可管理的阈值。如果分布未知(估计不足),我们的分析就会深入了解模型分布如何与实际分布的不同,指示改进是必要的。我们提供了实证实验,调节系统的警报率,具有≈2,500的自适应探测器在5小时的时间戳中得分超过1.5米的事件。此外,我们展示了Harshaw等人的真实网络数据和检测框架。替代情况,表明无法调节警报,指示检测模型如何不适合数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号