首页> 中文期刊>计算机应用与软件 >基于词条数学期望的词条权重计算方法

基于词条数学期望的词条权重计算方法

     

摘要

Text formal representation is always the fundamental issue in text mining. TFIDF ( Term Frequency. Inverse Document Frequency) calculation method in eigenspace model is a classical term weight calculation approach in text representation with better effect.based on analysing the problems in traditional TFIDF method of calculation, in light to that in TFIDF method it does not consider the distribution situation of various categories including the document contains the terms and to that there is different document number in each category,this paper proposes that to adopt mathematical expectations of the term (TFIDF-E) as a text factor for improving the above.Experimental results show that the text categorisation effect represented by TFIDF-E algorithm is better than the old TFIDF,the effectiveness and feasibility of TFIDF-E algorithm has been validated.%文本的形式化表示一直是文本挖掘的基础性问题,向量空间模型中的TFIDF计算方法是文本表示中一种效果较好的经典词条权重计算方法.在分析传统TFIDF计算方法存在问题的基础上,针对TFIDF方法中没有考虑包含词条的文档在各个类别的分布情况以及各个类别中所含的文档数的不同.提出了将词条的数学期望(TFIDF-E)作为一个文本因子来进行改进上述问题.实验结果表明,TFIDF-E计算方法表示的文本分类效果好于TFIDF,验证了TFIDF-E方法的有效性和可行性.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号