首页> 外文会议>2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery >A combined method for automatic domain-specific Terminology extraction
【24h】

A combined method for automatic domain-specific Terminology extraction

机译:一种自动提取特定领域术语的组合方法

获取原文

摘要

In this paper we present a Terminology extraction algorithm combining with machine learning and corpus-based statistical model. We collect a balanced corpus with all the possible nominal terms of every domain annotated, and take this corpus as training corpus. After selecting training features for terms, we use SVM to recognize terminological candidates in target corpus. Then we calculate the Domain Relevance (DR) and Domain Consensus (DC) scores for the terminological candidates to acquire domain-specific Terminologies. We make 4 experiments on Tourism corpus and short sentences with two kinds of balanced training corpora. Furthermore, we evaluate the precision and recall of our Terminology extraction algorithm by comparing the words in a golden standard with the words extracted by our system. The experiments show that our algorithm can get improved result in automatic extraction of nominal domain-specific Terminologies. A detailed analysis shows the advantages and disadvantages of our algorithm.
机译:在本文中,我们提出了一种结合了机器学习和基于语料库的统计模型的术语提取算法。我们收集平衡的语料库,并在每个域中标注所有可能的名义条款,并将该语料库作为训练语料库。为术语选择训练功能后,我们使用SVM识别目标语料库中的术语候选者。然后,我们为候选术语计算域相关性(DR)和域共识(DC)分数,以获取特定于域的术语。我们用两种平衡训练语料库对旅游语料库和短句进行了4个实验。此外,我们通过将黄金标准中的单词与系统提取的单词进行比较,来评估术语提取算法的准确性和召回率。实验表明,我们的算法在自动提取标称领域专有术语方面可以获得改进的结果。详细分析显示了我们算法的优缺点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号