【24h】

Anomaly Pattern Detection in Categorical Datasets

机译:分类数据集中的异常模式检测

获取原文

摘要

We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a "local anomaly detector" to identify individual records with anomalous attribute values, and then detect patterns where the number of anomalous records is higher than expected. Given the set of anomalies flagged by the local anomaly detector, we search over all subsets of the data defined by any set of fixed values of a subset of the attributes, in order to detect self-similar patterns of anomalies. We wish to detect any such subset of the test data which displays a significant increase in anomalous activity as compared to the normal behavior of the system (as indicated by the training data). We perform significance testing to determine if the number of anomalies in any subset of the test data is significantly higher than expected, and propose an efficient algorithm to perform this test over all such subsets of the data. We show that this algorithm is able to accurately detect anomalous patterns in real-world hospital, container shipping and network intrusion data.
机译:我们提出了一种检测分类数据集中异常模式的新方法。我们假设异常是由一些仅影响数据特定子集的基础过程产生的。我们的方法包括两个步骤:我们首先使用“局部异常检测器”来识别具有异常属性值的单个记录,然后检测异常记录数高于预期数量的模式。给定由本地异常检测器标记的一组异常,我们将搜索由属性子集的任何一组固定值定义的数据的所有子集,以检测异常的自相似模式。我们希望检测测试数据的任何此类子集,这些子集与系统的正常行为相比,表现出异常活动的显着增加(如训练数据所示)。我们执行显着性测试以确定测试数据的任何子集中的异常数量是否明显高于预期,并提出一种有效的算法来对数据的所有此类子集执行此测试。我们证明了该算法能够准确检测现实世界医院,集装箱运输和网络入侵数据中的异常模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号