首页> 外文会议>International conference on Asian language processing >Supervised learning for robust term extraction
【24h】

Supervised learning for robust term extraction

机译:监督学习,可进行可靠的术语提取

获取原文

摘要

We propose a machine learning method to automatically classify the extracted ngrams from a corpus into terms and non-terms. We use 10 common statistics in previous term extraction literature as features for training. The proposed method, applicable to term recognition in multiple domains and languages, can help 1) avoid the laborious work in the post-processing (e.g. subjective threshold setting); 2) handle the skewness and demonstrate noticeable resilience to domain-shift issue of training data. Experiments are carried out on 6 corpora of multiple domains and languages, including GENIA and ACLRD-TEC(1.0) corpus as training set and four TTC subcorpora of wind energy and mobile technology in both Chinese and English as test set. Promising results are found, which indicate that this approach is capable of identifying both single word terms and multiword terms with reasonably good precision and recall.
机译:我们提出了一种机器学习方法,用于将从语料库中提取的ngram自动分类为术语和非术语。我们在上一学期提取文献中使用10种常见统计数据作为训练的特征。所提出的方法适用于多种领域和语言的术语识别,可以帮助1)避免后期处理中的繁琐工作(例如主观阈值设置); 2)处理偏斜,并表现出对训练数据的域转移问题的显着适应力。实验以6个具有多种领域和语言的语料库进行,其中包括GENIA和ACLRD-TEC(1.0)语料库作为训练集,以及四个中,英文风能和移动技术的TTC子语料库作为测试集。发现有希望的结果,表明该方法能够以相当好的精度和召回率识别单个单词和多个单词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号