首页> 中文期刊>计算机应用研究 >动态自适应特征权重的多类文本分类算法研究

动态自适应特征权重的多类文本分类算法研究

     

摘要

Text classification plays an important role while studying text data mining and information retrieve, and computing and allocating term weight is the key process while classifying text. Therefore, this paper proposed a dynamic self-adaptive term weighting (DATW) for multi-class text classification, which overcame the disadvantages of the traditional term weighting algorithm TF-1DF. DATW not only considered the term frequency within a text and the number of a text corresponding the term within the whole training set, but also took into account the distribution coefficient and the gradient descent of a term to self-adapting dynamic text classification. It is validated that the performance of using DATW is superior to that of using TF-IDF.%文本分类是研究文本数据挖掘、信息检索的重要手段,文本特征项权重值的计算是文本分类算法的关键.针对经典的特征权重计算方法TF-IDF中存在的不足,提出了一种动态自适应特征权重计算方法(DATW).该算法不仅考虑了特征项在文本中出现的频率及该特征项所属文本在训练集中的数量,而且通过考查特征项的分散度和特征向量梯度差以自适应动态文本的分类.实验结果表明,采用DATW方法计算特征权重可以有效提高文本分类的性能.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号