首页> 外文期刊>Natural language engineering >DEXTER: A workbench for automatic term extraction with specialized corpora
【24h】

DEXTER: A workbench for automatic term extraction with specialized corpora

机译:DEXTER:使用专用语料库自动提取术语的工作台

获取原文
获取原文并翻译 | 示例
       

摘要

Automatic term extraction has become a priority area of research within corpus processing. Despite the extensive literature in this field, there are still some outstanding issues that should be dealt with during the construction of term extractors, particularly those oriented to support research in terminology and terminography. In this regard, this article describes the design and development of DEXTER, an online workbench for the extraction of simple and complex terms from domain-specific corpora in English, French, Italian and Spanish. In this framework, three issues contribute to placing the most important terms in the foreground. First, unlike the elaborate morphosyntactic patterns proposed by most previous research, shallow lexical filters have been constructed to discard term candidates. Second, a large number of common stopwords are automatically detected by means of a method that relies on the IATE database together with the frequency distribution of the domain-specific corpus and a general corpus. Third, the term-ranking metric, which is grounded on the notions of salience, relevance and cohesion, is guided by the IATE database to display an adequate distribution of terms.
机译:自动术语提取已成为语料库处理研究的优先领域。尽管在该领域有大量文献,但在构建术语提取器期间,仍然需要解决一些突出的问题,尤其是那些旨在支持术语和术语研究的问题。在这方面,本文介绍了DEXTER的设计和开发,DEXTER是一个在线工作台,用于以英语,法语,意大利语和西班牙语从特定领域的语料库中提取简单和复杂的术语。在此框架中,三个问题有助于将最重要的术语放在最前面。首先,与大多数以前的研究提出的精心构造的句法模式不同,浅层词汇过滤器已被构建为可丢弃术语候选词。其次,借助于依赖于IATE数据库的方法以及领域特定语料库和一般语料库的频率分布,可以自动检测大量常见停用词。第三,术语排序标准基于显着性,相关性和内聚性的概念,由IATE数据库指导显示适当的术语分布。

著录项

  • 来源
    《Natural language engineering》 |2018年第2期|163-198|共36页
  • 作者

    Perinan-Pascual Carlos;

  • 作者单位

    Univ Politecn Valencia, Appl Linguist Dept, Paranimf 1, Valencia 46730, Spain;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-18 02:08:37

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号