首页> 外文期刊>Network Science and Engineering, IEEE Transactions on >FAST-ODT: A Lightweight Outlier Detection Scheme for Categorical Data Sets
【24h】

FAST-ODT: A Lightweight Outlier Detection Scheme for Categorical Data Sets

机译:FAST-ODT:分类数据集的轻量级异常值检测方案

获取原文
获取原文并翻译 | 示例
           

摘要

Outlier detection is a key data analysis technique that aims to find unusual data objects in a data set. It has been widely used in varied areas, including communication networks, finance, medicine, environmental studies, etc. Many applications in these areas involve categorical data. For example, the data set used in the application of intrusion detection normally includes a group of captured packets, which tend to have categorical attributes such as "protocol". Although there are many outlier detection algorithms for applications involving numerical data, only a few existing schemes can handle categorical data. And the schemes designed for categorical data seriously suffer from two problems: low detection precision and high time complexity. In this paper, we present two novel outlier detection algorithms for categorical data sets. First of all, we describe a simple scheme based on entropy, Outlier Detection Tree (ODT). With ODT, a classification tree is constructed to classify the data set into two classes: a normal class and an abnormal class. Thereafter, each data object is identified as an outlier or a normal one using the if-then rules in the tree. Furthermore, we propose an advanced outlier detection algorithm, FAST-ODT, which achieves both high detection accuracy and low time complexity. Our experimental results indicate that FAST-ODT outperforms the existing algorithms in terms of outlier detection precision and computational complexity.
机译:异常值检测是一个关键数据分析技术,其旨在在数据集中找到异常数据对象。它已广泛用于各种区域,包括通信网络,金融,医学,环境研究等这些领域的许多应用涉及分类数据。例如,在应用入侵检测中使用的数据集通常包括一组捕获的分组,其倾向于具有诸如“协议”的分类属性。虽然有许多异常值检测算法用于涉及数值数据的应用程序,但只有一些现有方案可以处理分类数据。而设计用于分类数据的方案严重遭受两个问题:低检测精度和高时间复杂性。在本文中,我们为分类数据集提供了两种新的异常检测算法。首先,我们描述了一种基于熵,异常值检测树(ODT)的简单方案。使用ODT,构建分类树以将数据集分为两个类:正常类和异常类。此后,每个数据对象都使用树中的if-then-then规则标识为异常值或正常的。此外,我们提出了一种先进的异常检测算法,快速 - ODT,实现了高检测精度和低时间复杂度。我们的实验结果表明,在异常检测精度和计算复杂性方面,快速odt优于现有算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号