首页> 中文期刊>新疆农业大学学报 >基于向量空间模型中文农业网页分类技术研究

基于向量空间模型中文农业网页分类技术研究

     

摘要

This paper discusses the vector space model based on text classification technology,and analysis and research of the key technologies involved: features selection, feature vector representation method, the dimension of feature vector,evaluation standards of text classification. In order to compare and verify influence of text classification in the feature selection method, feature vector representation method and under different dimensions, 1 600 Chinese agricultural webpage were chosen for orthogonal experiments, and these factors were compared and analyzed, the best combination of selected classification results was selected. Experiments show that the best combination uses the DFD feature selection method to select the word with word frequency representation of feature words when the dimension of feature vectors is 300. In this combination,the average precision can reach 92. 63% and the average recall rate can reach 91. 5%.%对基于向量空间模型的文本分类所涉及的关键技术:特征选取、特征向量表示方法、特征向量的维数、文本分类的评价标准进行了分析和研究.为了对比和验证文本分类在特征词选取方法,特征向量表示方法以及在不同维数下对分类的影响,选择了1 600篇中文农业网页进行正交实验,并对这些因素进行比较和分析,选出分类效果最好的组合.研究表明,当使用综合文档频(DFD)特征词选取方法选取特征词,用词频表示特征向量,特征向量维数为300维时,有较好的分类效果,平均查准率可以达到92.63%,平均召回率可以达到91.5%.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号