首页> 外文会议>IEEE International Conference on Data Mining >Unsupervised Feature Selection for Outlier Detection by Modelling Hierarchical Value-Feature Couplings
【24h】

Unsupervised Feature Selection for Outlier Detection by Modelling Hierarchical Value-Feature Couplings

机译:通过对分层的值-特征耦合建模来进行离群值检测的无监督特征选择

获取原文

摘要

Proper feature selection for unsupervised outlier detection can improve detection performance but is very challenging due to complex feature interactions, the mixture of relevant features with noisy/redundant features in imbalanced data, and the unavailability of class labels. Little work has been done on this challenge. This paper proposes a novel Coupled Unsupervised Feature Selection framework (CUFS for short) to filter out noisy or redundant features for subsequent outlier detection in categorical data. CUFS quantifies the outlierness (or relevance) of features by learning and integrating both the feature value couplings and feature couplings. Such value-to-feature couplings capture intrinsic data characteristics and distinguish relevant features from those noisy/redundant features. CUFS is further instantiated into a parameter-free Dense Subgraph-based Feature Selection method, called DSFS. We prove that DSFS retains a 2-approximation feature subset to the optimal subset. Extensive evaluation results on 15 real-world data sets show that DSFS obtains an average 48% feature reduction rate, and enables three different types of pattern-based outlier detection methods to achieve substantially better AUC improvements and/or perform orders of magnitude faster than on the original feature set. Compared to its feature selection contender, on average, all three DSFS-based detectors achieve more than 20% AUC improvement.
机译:正确选择特征以进行无监督的离群值检测可以提高检测性能,但由于复杂的特征交互作用,不平衡数据中相关特征与噪声/冗余特征的混合以及类别标签的不可用,因此非常具有挑战性。应对这一挑战的工作很少。本文提出了一种新颖的耦合无监督特征选择框架(简称CUFS),以过滤出嘈杂或多余的特征,以用于随后在分类数据中的离群值检测。 CUFS通过学习和集成特征值耦合和特征耦合来量化特征的离群性(或相关性)。这样的值对特征耦合捕获了固有的数据特征,并将相关特征与那些嘈杂的/冗余的特征区分开。 CUFS进一步实例化为一种无参数的基于密集子图的特征选择方法,称为DSFS。我们证明DSFS保留了2个近似特征子集到最优子集。对15个真实数据集的广泛评估结果表明,DSFS的平均特征减少率达到48%,并且使三种不同类型的基于模式的离群值检测方法能够实现比上显着更好的AUC改进和/或更快的数量级。原始功能集。与它的功能选择竞争者相比,所有三个基于DSFS的检测器平均都能实现20%以上的AUC改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号