【24h】

An Improved Automatic Term Recognition Method for Spanish

机译:一种改进的西班牙自动术语识别方法

获取原文

摘要

The C-value/NC-value algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the C-value calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish.
机译:C值/ NC值算法是一种自动术语识别的混合方法,最初是开发的,以从用英语编写的专门文件中提取多字词候选人。在这里,我们对该算法提出了三个主要修改,这会影响所获得的输出如何精制。第一个修改旨在通过新方法来最大化候选人列表中的实际术语数量,以实现止损列表应用程序。第二种修改适应C值计算公式,以便考虑单个单词术语。第三种修改改变了术语候选者的分组方式,利用输入语料库的lemmated版本。此外,候选上下文窗口的大小是可变的。我们还展示了必要的语言修改,以将该算法应用于以西班牙语识别术语候选人。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号