首页> 外文期刊>Automatic Control and Computer Sciences >Russian-Language Thesauri: Automatic Construction and Application for Natural Language Processing Tasks
【24h】

Russian-Language Thesauri: Automatic Construction and Application for Natural Language Processing Tasks

机译:俄语译文:自动构建和自然语言处理任务的应用

获取原文
获取原文并翻译 | 示例
       

摘要

The paper overviews the existing digital Russian-language thesauri and the methods of their automatic construction and application. The authors have analyzed the main characteristics of thesauri published in open access for scientific research, evaluated trends of their development, and their effectiveness in solving natural language processing tasks. Statistical and linguistic methods of thesaurus construction that allow automation of their development and reduce the labor costs of expert linguists have been studied. In particular, algorithms for extracting keywords and semantic thesaurus relations of all types have been considered and the quality of the thesauri generated with the use of these tools was assessed. To illustrate features of various methods of constructing thesaurus relations, the authors developed a combined method that fully automatically generates a specialized thesaurus based on a text corpus of a selected domain and several existing linguistic resources. The proposed method was used to conduct experiments on two Russian-language text corpora that represent two different domains: articles on migration and tweets. The resulting thesauri were analyzed by means of an integrated assessment that had been developed by the authors in a previous study and allows one to determine various aspects of the analyzed thesaurus and appraise the quality of the methods of its generation. The analysis revealed the main advantages and disadvantages of various approaches to thesaurus construction and extraction of semantic relations of different types, and also made it possible to identify potential focus areas for future research.
机译:本文概述了现有的数字俄语叙述和自动施工和应用方法。作者分析了在开放式访问中发表的科学研究,评估其发展趋势的主要特征,以及它们在解决自然语言处理任务方面的有效性。研究了允许其发展自动化和降低专家语言学家的自动化和降低专家语言学家的劳动力成本的统计和语言学方法。特别地,已经考虑了用于提取关键词和语义叙事关系的算法,并评估了使用这些工具产生的叙述的质量。为了说明构建词库关系的各种方法的特征,作者开发了一种组合方法,该方法基于所选域的文本语料库和几个现有语言资源完全自动生成专用词库。所提出的方法用于对两种不同域名的两种俄语文本语料库进行实验:关于迁移和推文的文章。通过由前一项研究中的作者开发的综合评估进行了分析所产生的叙述,并允许其中确定分析的词库的各个方面,并评估其生成方法的质量。分析揭示了各种方法的主要优点和缺点,以及不同类型的语义关系的提取,也可以识别未来研究的潜在焦点领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号