...
首页> 外文期刊>Expert systems with applications >Projected Outlier Detection In High-dimensional Mixed-attributes Data Set
【24h】

Projected Outlier Detection In High-dimensional Mixed-attributes Data Set

机译:高维混合属性数据集中的投影离群值检测

获取原文
获取原文并翻译 | 示例

摘要

Detecting outlier efficiently is an active research issue in data mining, which has important applications in the field of fraud detection, network intrusion detection, monitoring criminal activities in electronic commerce, etc. Because of the sparsity of high dimensional data, it is reasonable and meaningful to detect the outliers in suitable projected subspaces. We call such subspace and outliers in the subspace as anomaly subspace and projected outlier respectively. Many efficient algorithms have already been proposed for outlier detection based on different approaches, but there are few literatures on projected outlier detection for high dimensional data sets with mixed continuous and categorical attributes. In this paper, a novel projected outlier detection algorithm is proposed to detect projected outliers in high-dimensional mixed attribute data set. Our main contributions are: (1) combined with information entropy, a novel measure of anomaly subspace is proposed. In this anomaly subspace, meaningful outliers could be detected and explained. Unlike the previous projected outlier detection methods, the dimension of anomaly subspace is not decided beforehand; (2) theoretical analysis about this measure is presented; (3) bottom-up method is proposed to find the interesting anomaly subspaces; (4) the outlying degree of projected outlier is defined, which has good explanations; (5) the data set with mixed data type is handled; (6) experiments on synthetic and real data sets to evaluate the effectiveness of our approach are performed.
机译:有效地检测异常值是数据挖掘中的一个活跃的研究问题,在欺诈检测,网络入侵检测,监视电子商务中的犯罪活动等领域具有重要的应用。由于高维数据的稀疏性,它是合理且有意义的检测合适的投影子空间中的离群值。我们将这种子空间和子空间中的离群值分别称为异常子空间和投影离群值。已经提出了许多基于不同方法的用于离群值检测的有效算法,但是关于具有混合连续和分类属性的高维数据集的投影离群值检测的文献很少。本文提出了一种新的投影离群值检测算法,用于检测高维混合属性数据集中的投影离群值。我们的主要贡献是:(1)结合信息熵,提出了一种新的异常子空间度量方法。在此异常子空间中,可以检测并说明有意义的异常值。与先前预测的异常值检测方法不同,异常子空间的维数不是预先确定的。 (2)对这一措施进行了理论分析; (3)提出了自下而上的寻找有趣子空间的方法。 (4)定义了投影离群点的离群度,有很好的解释; (5)处理混合数据类型的数据集; (6)在综合和真实数据集上进行了实验,以评估我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号