目前特征选择方法中常用的特征相关性测度可有效评估两个特征之间的相关性,但却将特征孤立看待,没有考虑其它特征对它们相关性的影响.文中在整体考虑特征之间关系的前提下,提出用稀疏表示系数评估特征的相关性,它与现有特征相关性测度的不同之处在于可揭示特征在其它所有特征影响下与目标的相关性,反映特征间的相互影响.为验证稀疏表示系数评估特征相关性的有效性,在典型的高维小样本数据上,比较了ReliefF方法及分别以稀疏表示系数、对称不确定性和皮尔森相关系数为相关性测度的特征选择方法选择的特征集的分类能力.实验结果表明文中方法选择的特征集的分类能力高且较稳定.%The feature relevance measures employed by current feature selection methods can effectively evaluate the relevance between two features, but they do not consider the influence of the other features on them. On the premise of considering feature interaction overall, sparse representation coefficient is proposed as a feature relevance measure. The difference between the proposed method and the existing relevance measures is that it reveals the relevance between feature and target under the influence of the other features, which reflects feature interaction. In order to verify the effectiveness of sparse representation coefficient to measure relevance of feature, the classification performance is compared among feature subsets selected by Relief F and the feature selection methods using sparse coefficient, symmetrical uncertainty and Pearson correlation coefficient as relevance measures respectively. The experimental results show that the classification performance of the features selected by the proposed method is higher and more stable.
展开▼