首页> 外文OA文献 >Domain specific adaptation of a statistical machine translation engine in Slovene language
【2h】

Domain specific adaptation of a statistical machine translation engine in Slovene language

机译:斯洛文尼亚语言的统计机器翻译引擎的领域特定适应

摘要

Machine translation, especially statistical machine translation gained a lot of interest in recent years, mainly thanks to the increase of publicly available multilingual language resources. In terms of obtaining the basic understanding of the target language text, the majority of free machine translation systems give us satisfactory results but are not accurate enough for specific domain texts. For some foreign languages, research shows increases in the quality of the machine translation if trained with the in-domain data. Such research has not yet been conducted for the Slovenian language which presents the motivation for our research. Additional motivation is the nonexistence of a publicly available language model for the Slovenian language. udThis master thesis focuses on a statistical machine translation system adaptation for a specific domain in the Slovenian language. Various approaches for the adaptation to a specific domain are described. We set up the Moses machine translation system framework and acquire and adapt existing general corpora for the Slovenian language as a basis for building a comparative linguistic model. Annotated and non-annotated Slovenian corpus, ccGigafida, is used to create a linguistic model of the Slovenian language. For the pharmaceutical domain, existing English-Slovenian translations and other linguistic resources have been found and adapted to serve as a learning base for the machine translation system. We evaluate the impact of various linguistic resources on the quality of machine translation for the pharmaceutical domain. The evaluation is conducted automatically using the BLEU metrics. In addition, some test translations are manually evaluated by experts and potential system users. The analysis shows that test translations, translated with the domain model, achieve better results than translations that are generated using the out-of-domain model. Surprisingly, bigger, combined model, does not achieve better results than the smaller domain model. The manual analysis of the resulting fluency and adequacy shows that translations that achieve a high BLEU grade can achieve lower fluency or adequacy grades than the test translations that otherwise achieved a lower BLEU grade. The experiment with the addition of the domain-based dictionary to the in-domain translation model shows a gain of 1 BLEU grade and assures the use of the desired terminology.ud
机译:近年来,机器翻译,尤其是统计机器翻译引起了人们的极大兴趣,这主要归功于公开可用的多语言语言资源的增加。在获得对目标语言文本的基本理解方面,大多数免费机器翻译系统都给我们令人满意的结果,但对于特定领域的文本来说不够准确。对于某些外语,研究表明,如果使用域内数据进行训练,机器翻译的质量将会提高。尚未针对斯洛文尼亚语进行过此类研究,这为我们的研究提供了动力。另一个动机是斯洛文尼亚语的公共语言模型不存在。 ud本论文主要研究针对斯洛文尼亚语特定领域的统计机器翻译系统。描述了用于适应特定域的各种方法。我们建立了Moses机器翻译系统框架,并获得和改编了斯洛文尼亚语的现有通用语料库,以此作为建立比较语言模型的基础。带注释和不带注释的斯洛文尼亚语料库ccGigafida用于创建斯洛文尼亚语言的语言模型。对于制药领域,已经找到了现有的英语-斯洛文尼亚语翻译和其他语言资源,并将其用作机器翻译系统的学习基础。我们评估了各种语言资源对制药领域机器翻译质量的影响。评估是使用BLEU指标自动进行的。另外,一些测试翻译由专家和潜在的系统用户手动评估。分析表明,与使用域外模型生成的翻译相比,使用域模型进行翻译的测试翻译可获得更好的结果。令人惊讶的是,较大的组合模型没有比较小的域模型获得更好的结果。对所获得的流利度和适当性的人工分析表明,达到较高BLEU等级的翻译可以达到较低的流利度或适当性等级,而没有达到较低BLEU等级的测试翻译。在域内翻译模型中添加基于域的字典的实验显示,其BLEU等级提高了1级,并确保使用了所需的术语。 ud

著录项

  • 作者

    Kadivec Jože;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号