首页> 中文期刊> 《计算机科学》 >基于相关性和冗余度的联合特征选择方法

基于相关性和冗余度的联合特征选择方法

             

摘要

比较研究了与类别信息无关的文档频率和与类别信息有关的信息增益、互信息和X2统计特征选择方法,在此基础上分析了以往直接组合这两类特征选择方法的弊端,并提出基于相关性和冗余度的联合特征选择算法.该算法将文档频率方法分别与信息增益、互信息和X2统计方法联合进行特征选择,旨在删除冗余特征,并保留有利于分类的特征,从而提高文本情感分类效果.实验结果表明,该联合特征选择方法具有较好的性能,并且能够有效降低特征维数.%Based on a comparative study of four feature selection methods, including document frequency(DF) unrelated to class information, and information gain(IG), mutual information(MI) and chi-square statistic (CHI), which are related to class information, we analyzed the disadvantages of combining these two kinds of methods directly and proposed a joint feature selection method based on relevance and redundancy to joint DF and one of IG,MI and CHL This approach aims to eliminate redundant features,find useful features for classification and consequently improve the accuracy of text sentiment classification. The results of the experiment show that the proposed method can not only improve the performance but also reduce the feature dimension.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号