首页> 中文期刊> 《计算机工程与设计》 >利用类-项权重和类-项密度的文本分类方法

利用类-项权重和类-项密度的文本分类方法

         

摘要

To obtain more accurate text classification results,and discuss the effects of classifier on text classification,text classification method based on Term-Class Weight and Term-Class Density was proposed,which used SVM and k-NN classifier for studying.Term-Class Weight was the ratio of the total file containing the items and files containing the class file.Term-Class Density was the ratio of number of items in interest class and number of items in the entire corpus.The two characteristics were taken as a measurement for text classification.The labeled documents were classified into the known classes.The relative degree of the object was predicted using the proposed measurement.The classifier was adopted for classification of texts.The data set of 20 newsgroups was adopted in the experiments.Experimental results show that compared to other similar methods,the proposed method has higher classification accuracy and the recall and F measure performance are better,which has potential application value.%为获得更加准确的文本分类结果,讨论分类器对文本分类的影响,提出一种基于类-项权重和类-项密度的文本分类方法,使用SVM和k-NN分类器进行研究.类-项权重是指包含项的文件总量与包含项的类文件总量的比率,类-项密度是指兴趣类中项发生数量与整个语料库中项发生数量的比率,将这两个特征作为文本分类的度量方法.将标记的文件归类到已知类中,使用提出的度量方法预测所给对象的相关程度,使用分类器进行分类.对20个新闻组的数据集进行实验,实验结果表明,相比于其它同类方法,该方法拥有更高的分类精度,查全率和F测度表现优异,具有潜在的应用价值.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号