首页> 外文OA文献 >Unsupervised feature selection by regularized self-representation
【2h】

Unsupervised feature selection by regularized self-representation

机译:通过正则化自我表示进行无监督特征选择

摘要

By removing the irrelevant and redundant features, feature selection aims to find a compact representation of the original feature with good generalization ability. With the prevalence of unlabeled data, unsupervised feature selection has shown to be effective in alleviating the curse of dimensionality, and is essential for comprehensive analysis and understanding of myriads of unlabeled high dimensional data Motivated by the success of low-rank representation in subspace clustering, we propose a regularized self-representation (RSR) model for unsupervised feature selection, where each feature can be represented as the linear combination of its relevant features. By using L-2,L-1-norm to characterize the representation coefficient matrix and the representation residual matrix, RSR is effective to select representative features and ensure the robustness to outliers. If a feature is important, then it will participate in the representation of most of other features, leading to a significant row of representation coefficients, and vice versa. Experimental analysis on synthetic and real-world data demonstrates that the proposed method can effectively identify the representative features, outperforming many state-of-the-art unsupervised feature selection methods in terms of clustering accuracy, redundancy reduction and classification accuracy.
机译:通过去除不相关和多余的特征,特征选择旨在找到具有良好泛化能力的原始特征的紧凑表示。随着无标签数据的盛行,无监督特征选择已显示出可有效减轻维数的诅咒,对于全面分析和理解无标签的高维数据无数至关重要,这是由于子空间聚类中低秩表示法的成功所致,我们提出了一种用于无监督特征选择的正则化自我表示(RSR)模型,其中每个特征都可以表示为其相关特征的线性组合。通过使用L-2,L-1-范数来表征表示系数矩阵和表示残差矩阵,RSR有效地选择了代表性特征并确保了对异常值的鲁棒性。如果某个特征很重要,则它将参与大多数其他特征的表示,从而导致出现大量的表示系数,反之亦然。对合成数据和现实世界数据的实验分析表明,该方法可以有效地识别代表性特征,在聚类精度,冗余减少和分类精度方面优于许多最新的无监督特征选择方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号