首页> 中文期刊>计算机应用与软件 >基于特征权重与词间相关性的文本特征选择算法

基于特征权重与词间相关性的文本特征选择算法

     

摘要

传统的ReliefF算法使用二值法不能体现离散特征差异大小,且不能去除冗余特征.针对这种情况提出了mRMR-ReliefF特征选择算法.该算法利用概率弥补特征差异度量上的不足,提出新的差异函数.此函数使提取出的特征更能体现文本的类内相关性和类间差异性.该算法还结合了词间相关性.词间相关性在考虑选择和类别相关性大的特征词的同时还考虑了特征冗余的消除.通过三种算法的对比实验,表明该算法为文本分类提供了更有效的特征子集.%Traditional ReliefF algorithm, by using the binary method, can neither reflect the differences of discrete characteristics nor remove the redundant features. In view of this situation, mRMR-ReliefF feature selection algorithm is proposed. The algorithm makes up for the deficiency of feature difference measurement by utilising the probability, and puts forward a new difference function. This function makes the extracted features better reflect both the relevancy within the class and difference among classes of the texts. The algorithm also combines the words relevancy, which not only considers the selection of characteristic words that has much to do with the class but also considers redundancy eliminating. According to the comparison of three algorithms, it shows that the algorithm our paper proposing can provide a more effective feature subset for the text classification.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号