首页> 中文期刊> 《计算机应用与软件》 >基于二次TF*IDF的互信息文本特征选择算法研究

基于二次TF*IDF的互信息文本特征选择算法研究

         

摘要

Based on analysing the shortcomings of traditional mutual information algorithm, a new algorithm of quadratic TF * IDF-based mutual information text feature selection is put forward.This new algorithm measures the importance of feature words appeared only in one category once again and solves the problem that the feature selection can not be done effectively when the values of mutual information are equal.Some experiments are done to verify this algorithm on Bayesian classifier and the results illustrate that our algorithm leads to better efficiency and accuracy in text classification than the former algorithm.%在分析传统互信息法缺陷的基础上,提出一种基于二次TF*IDF的互信息特征选择算法,对仅在一个类别中出现的特征词的重要程度给予再次的衡量,解决了互信息值相等而无法进行有效特征选择的问题.利用贝叶斯分类器对该方法进行验证的结果表明该算法在文本分类效率和正确率上比原有方法有一定的提高.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号