首页> 中文期刊> 《计算机与数字工程》 >基于概念关系的文本特征提取方法*

基于概念关系的文本特征提取方法*

     

摘要

针对基于词频统计的T D‐ID F文本特征提取方法缺乏对文本中概念关系处理,而使提取到的文本特征具有概念冗余、特征不明确等问题,提出基于本体概念相似度的词频统计方法。利用文本元素之间的语义相似度调整特征元素的词频,突出特征元素的语义贡献、消除特征冗余,增强特征集合元素的特征独立性。最后结合文本概念的共现特性,对可能出现某些重要特征元素因词频统计而被忽略的问题进行处理,从而准确、高效地提取文本特征。%Owing to the problem that the method that TF‐IDF text feature extraction based on word frequency statistic lacks the concept relations in the text ,there are some problems in the text feature extraction ,such as the redundancy of con‐cept and unclear feature .The method of the word frequency statistics based on similarity of ontology concepts is introduced . The frequency of feature element using semantic similarity between text elements is applied .It emphasizes the semantic con‐tribution of feature element ,eliminating redundancy of feature ,and enhancing independence of the elements of the features collection .Finally ,combined with the co‐occurrence characteristics of the concepts of the text ,it accomplishes to deal with ignored problems that some important feature elements through word frequency statistics lead to ignoring .Consequently ,it achieves the goal that it can extract text accurately and efficiently .

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号