首页> 中文期刊>计算机工程与设计 >领域概念术语提取中特征项自动抽取方法

领域概念术语提取中特征项自动抽取方法

     

摘要

针对领域概念术语提取过程中特征项来源于人工获取领域文本集以及特征项抽取的准确性不高的问题,提出一种特征项自动抽取方法.首先利用第三方接口从文献资源库中获取大量领域文本集,并对其进行段落分析,在文本预处理阶段提出一种改进的无词典分词方法进行二次分词,结合TFIDF,开方检验,信息增益及词汇位置权重方法进行特征项抽取.实验结果表明,该方法能实现特征项自动化抽取,且准确性较高%Extracting the concept terms often has the problems of requiring lots of domain experts' time to manually extract feature from the domain documents, and not able to get high accuracy. An auto-extraction of feature items is proposed in this paper. Firstly, using third-party interface to capture numerous text resources and analyzing these texts. Then, an improved method of non-dictionary of no word segmentation is introduced during the text preprocessing. Combing with the method of TFIDF, Evolution test, 1G and the position weighting of vocabulary to capture feature. Experiments have shown the method can achieve the auto-extraction of feature and improve the accuracy of the extraction of feature.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号