首页> 外文OA文献 >Exploiting linguistically-enriched models of phrase-based statistical machine translation
【2h】

Exploiting linguistically-enriched models of phrase-based statistical machine translation

机译:开发基于短语的统计机器翻译的语言丰富模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This thesis presents the design and implementation of linguistically-informed models forudstatistical phrase-based machine translation. Using Koehn’s Pharaoh (2004), a state-of-the-artudSMT system, and Moses (Hoang, 2006), a variant of the former which supports factoredudtranslation models, we have investigated two approaches: Combined Feature Models andudFactored Models. While Combined Feature Models make use of concatenations of linguisticudfeatures to enrich their models, Factored Models view a token as a vector of factors, enablingudto build relatively independent models for each factor. In the context of machine translation,udboth models were expected to enrich the existing surface word model with additionaludlinguistic information.udThe research undertaken focused on finding ways to improve output translation qualityudfor English-to-French and French-to-English translations from various standpoints. A betterudgeneral readability and understandability of a generated document should be achieved mainlyudby ensuring the text fluency in the target language (syntactic correctness), its adequacy (use ofudadequate terminology) and its fidelity (semantic adequacy). These main goals were addressedudby first of all analysing the Pharaoh’s current performance, and understanding language specificudand model-related problems encountered. Several experiments were then performedudusing our two approaches, and their results were compared.udDespite a few noted improvements in some of the linguistic issues discussed, notablyudfixed expression translation and part-of-speech ambiguity, major problems involving complexudsyntactic structures in the source language still posed a hard challenge to the approach ofudlinguistically augmenting phrase-based statistical machine translation.
机译:本文提出了基于非统计短语的机器翻译的语言信息模型的设计与实现。使用Koehn的Pharaoh(2004)(一种最新的 udSMT系统)和Moses(Hoang,2006)(一种支持因子 udtranslation模型的前者的变体),我们研究了两种方法:组合特征模型和 udFactored模型。组合特征模型利用语言特征的级联来丰富其模型,而因子模型则将令牌视为因子的向量,从而能够为每个因子建立相对独立的模型。在机器翻译的背景下, udboth模型有望通过附加的 udlinguistic信息来丰富现有的表面单词模型。 ud这项研究着重于寻找提高输出翻译质量的方法 ud-英语到法语和法语-到-英语各种角度的英语翻译。应当主要通过确保目标语言的文本流利性(语法正确性),其适当性(使用足够的术语)和其保真度(语义适当性)来实现所生成文档的更好的预算可读性和可读性。首先,通过分析法老王的表现并了解遇到的与语言,模型相关的问题来解决这些主要目标。然后使用我们的两种方法进行了几次实验,并比较了它们的结果。 ud尽管讨论的某些语言问题得到了一些明显的改进,尤其是固定的表达翻译和词性歧义,主要问题涉及复杂语法源语言中的结构仍然对 udlingually扩充基于短语的统计机器翻译的方法提出了严峻的挑战。

著录项

  • 作者

    Guthmann Noemie;

  • 作者单位
  • 年度 2006
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号