首页> 中文期刊>计算机应用研究 >一种基于反向文本频率互信息的文本挖掘算法研究

一种基于反向文本频率互信息的文本挖掘算法研究

     

摘要

In view of the traditional text classification algorithm has the characteristics of classification results on the influence of the same, the classification accuracy rate is low, caused at the same time algorithm time complexity increases, based on the analysis of the text classification system of the general model, as well as in the application of mutual information feature extraction method based on feature, this paper put forward a method based on reverse text frequency mutual information entropy text classification algorithm. The algorithm first used based on the VSM on the text sample vector feature extraction, then the text imaged to extract keywords set, selection of key words in the text, using mutual information to represent and computational lexicon and document classification correlation, finally calculated key words in the document weight. The experimental results show that the proposed algorithm and the traditional classification algorithm, has high computing speed and strong nonlinear mapping ability, the speed of convergence and accuracy are better classification effect.%针对传统的文本分类算法存在着各特征词对分类结果的影响相同,分类准确率较低,同时造成了算法时间复杂度的增加,在分析了文本分类系统的一般模型,以及在应用了互信息量的特征提取方法提取特征项的基础上,提出一种基于反向文本频率互信息熵文本分类算法.该算法首先采用基于向量空间模型(vector space model,VSM)对文本样本向量进行特征提取;然后对文本信息提取关键词集,筛选文本中的关键词,采用互信息来表示并计算词汇与文档分类相关度;最后计算关键词在文档中的权重.实验结果表明了提出的改进算法与传统的分类算法相比,具有较高的运算速度和较强的非线性映射能力,在收敛速度和准确程度上也有更好的分类效果.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号