首页> 外文期刊>ACM Computing Surveys >Anomaly Detection Methods for Categorical Data: A Review
【24h】

Anomaly Detection Methods for Categorical Data: A Review

机译:类别数据的异常检测方法:审查

获取原文
获取原文并翻译 | 示例

摘要

Anomaly detection has numerous applications in diverse fields. For example, it has been widely used for discovering network intrusions and malicious events. It has also been used in numerous other applications such as identifying medical malpractice or credit fraud. Detection of anomalies in quantitative data has received a considerable attention in the literature and has a venerable history. By contrast, and despite the widespread availability use of categorical data in practice, anomaly detection in categorical data has received relatively little attention as compared to quantitative data. This is because detection of anomalies in categorical data is a challenging problem. Some anomaly detection techniques depend on identifying a representative pattern then measuring distances between objects and this pattern. Objects that are far from this pattern are declared as anomalies. However, identifying patterns and measuring distances are not easy in categorical data compared with quantitative data. Fortunately, several papers focussing on the detection of anomalies in categorical data have been published in the recent literature. In this article, we provide a comprehensive review of the research on the anomaly detection problem in categorical data. Previous review articles focus on either the statistics literature or the machine learning and computer science literature. This review article combines both literatures. We review 36 methods for the detection of anomalies in categorical data in both literatures and classify them into 12 different categories based on the conceptual definition of anomalies they use. For each approach, we survey anomaly detection methods, and then show the similarities and differences among them. We emphasize two important issues, the number of parameters each method requires and its time complexity. The first issue is critical, because the performance of these methods are sensitive to the choice of these parameters. The time complexity is also very important in real applications especially in big data applications. We report the time complexity if it is reported by the authors of the methods. If it is not, then we derive it ourselves and report it in this article. In addition, we discuss the common problems and the future directions of the anomaly detection in categorical data.
机译:异常检测在不同的领域具有许多应用。例如,它已被广泛用于发现网络入侵和恶意事件。它也已用于许多其他应用,例如识别医疗保健或信用欺诈。在定量数据中检测异常在文献中得到了相当大的关注,并且具有尊敬的历史。相比之下,尽管在实践中广泛使用了分类数据,但与定量数据相比,分类数据中的异常检测已经接受了相对较少的关注。这是因为在分类数据中的异常检测是一个具有挑战性的问题。一些异常检测技术依赖于识别代表性模式,然后测量对象之间的距离和该模式。远离此模式的对象被声明为异常。然而,与定量数据相比,识别模式和测量距离不容易以分类数据。幸运的是,在最近的文献中发表了几篇侧重于分类数据中的异常的论文。在本文中,我们对分类数据中的异常检测问题进行了全面审查。以前的评论文章专注于统计文献或机器学习和计算机科学文献。该评论文章结合了两个文献。我们审查了36种方法,用于检测两个文献中的分类数据中的异常,并根据他们使用的异常的概念定义将它们分为12种不同的类别。对于每种方法,我们调查异常检测方法,然后显示它们之间的相似性和差异。我们强调两个重要问题,每个方法所需的参数数量及其时间复杂性。第一个问题至关重要,因为这些方法的性能对这些参数的选择敏感。时间复杂性在尤其是大数据应用中的实际应用中也非常重要。我们报告时间复杂性,如果作者的作者报告了该方法的作者。如果不是,那么我们我们自己派生了它并在本文中报告。此外,我们讨论了分类数据中异常检测的常见问题和未来方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号