【24h】

Extracting Chinese multi-word terms from small corpus

机译:从小型语料库中提取中文多词术语

获取原文

摘要

In this paper, we present an automatic terminology extraction approach for Chinese multi-word terms. In this term extraction system, besides five linguistic rules acquired from an available term list by some machine learning methods, two statistical strategies are involved: a termhood measure based on the term distribution variation, and a unithood measure adopting the left and right entropy method to estimate the collocation variation degree. The candidates are ranked according to the values of the former. The latter is used to filter the preposition phrases and some verb-object phrases that rarely appear as terms. By validating on a small scale corpus in the computer domain, the precision reaches 91.5% of the top 2000 outputs.
机译:在本文中,我们提出了一种针对中文多词术语的自动术语提取方法。在此术语提取系统中,除了通过某些机器学习方法从可用术语列表中获取的五个语言规则外,还涉及两种统计策略:基于术语分布变化的术语测度,以及采用左右熵方法进行统计的单位测度。估计搭配变化程度。根据前者的值对候选人进行排名。后者用于过滤介词短语和一些很少作为术语出现的动词-宾语短语。通过在计算机领域中的小规模语料库进行验证,精度可以达到前2000名输出的91.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号