首页> 中文期刊> 《计算机应用与软件》 >基于TFIDF文本特征加权方法的改进研究

基于TFIDF文本特征加权方法的改进研究

         

摘要

针对传统TFIDF方法将文档集作为整体来处理,并没有考虑到特征项在类间和类内的分布情况的不足,提出一种结合信息熵的TFIDF改进方法.该方法采用结合特征项在类间和类内信息分布熵来调整TFIDF特征项的权重计算,避免了那些对分类没有贡献的特征项被赋予较大权值的缺陷,能更有效计算文本特征项的权重.实验结果表明该方法提高了文本分类的精确度和召回率,是一种比较有效的文本特征加权方法.%Aiming at the problem that the document set is dealt with as a whole and the distribution of feature items among and in classes is not taken into full account when using traditional TFIDF method, an improved TFIDF method which is combined with information entropy is proposed.This method modifies the method of calculating weights of feature items of TFIDF by combining information entropies of feature items among and in classes, which overcomes the defect that the feature items that made less contribution to the categorisation would he given greater weight, thus is able to calculate weights of text feature items more efficiently.Experimental results show that the proposed method enhances recall and precision of text categorisation and is a more effective text feature weighting method.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号