【24h】

An improved method of term weighting for text classification

机译:文本分类中术语加权的一种改进方法

获取原文

摘要

In text classification, term weighting methods design appropriate weights to the given terms to improve the text classification performance. Traditional algorithm of term weighting only considers about tf (term frequency), idf (inverse document frequency) and so on, and this approach simply thinks low frequency terms are important, high frequency terms are unimportant, so it designs higher weights to the rare terms frequently. In this paper, we present an effective term weighting approach to avoid the deficiency of the traditional approach, and make use of kNN classifiers to classify over widely-used benchmark data set Reuters-21578. The experimental results prove that the new approach can improve the accuracy of classification.
机译:在文本分类中,术语加权方法为给定的术语设计适当的权重,以提高文本分类性能。传统的术语加权算法仅考虑tf(术语频率),idf(文档逆频率)等问题,这种方法只是认为低频术语很重要,高频术语并不重要,因此它为稀有术语设计了更高的权重频繁地。在本文中,我们提出了一种有效的术语加权方法,以避免传统方法的不足,并利用kNN分类器对广泛使用的基准数据集Reuters-21578进行分类。实验结果证明,该方法可以提高分类的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号