首页> 中文期刊> 《计算机应用与软件》 >一种面向专利摘要的领域术语抽取方法

一种面向专利摘要的领域术语抽取方法

         

摘要

专利领域中术语抽取结果的好坏决定了本体构建的质量。提出一种自动生成过滤词典并结合词汇密集度等影响因子的术语抽取方法。首先在分词和词性标注的基础上,对文献匹配词性规则算法生成的模板得到候选长术语和单词型短术语集合,然后利用文档一致度生成的过滤词典过滤部分候选长术语集,最后针对长术语的构成特点,将词汇密集度、文档差比、文档一致度三个术语因子加权平均作为整个长术语的术语权重值,并按值高低排序。在8000篇专利摘要文献的基准语料上进行实验,随机选取五组实验数据,平均准确率达到86%。结果表明该方法在领域术语抽取方面是行之有效的。%The quality of ontology is determined by the result of terminology extraction in patent field.In this paper we propose a method of terminology extraction,which automatically generates the filtering dictionary and combines the effect of factors such as the intensity of vocabulary terms.First,on the basis of word segmentation and parts of speech tagging,it matches the template generated by the parts of speech rule algorithm on the literatures and gets the candidate long terms set and word-type short terms set.Then it uses the filtering dictionaries generated with documentation coincidence to filter part of the candidate long term set.Finally,in light of the characteristic of long terms constitution,it uses the weighted average of three term factors of word intensity,document discrepancy ratio and document consistency as the term weight of whole long terms,and sorts them from high to low.Experiments were conducted on the benchmark corpus of 8000 patent summary literatures,and we randomly selected five sets of experimental data,the average accuracy rate achieved 86%.Results showed that the method was effective in the aspect of field terminology extraction.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号