首页> 外文OA文献 >Towards Statistical Machine Translation with Unification Grammars
【2h】

Towards Statistical Machine Translation with Unification Grammars

机译:借助统一语法实现统计机器翻译

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Traditional Statistical Machine Translation (SMT) models account poorly for many linguistic phenomena, such as subject-verb agreement and differences in word-order between languages. Recent work, such as that in factored phrase-based models, has shown promising improvements in translation quality through the use of linguistically-richer models. Unification-based approaches to grammar offer a framework for modelling agreement, a particular problem in generating morphologically-rich languages, and so in order to gauge the potential gains available from their application to SMT we first consider how to automatically recognise and measure agreement failure. We focus upon the specific issue of declension in German noun phrases and propose a simple unification-based approach to the problem. We develop an agreement checker based on this approach and use it to assess the agreement failure rate of a hierachical phrase-based translation system trained on the small News Commentary corpus. Initially we find that our checker reports unreasonably high failure rates on the fluent training data, and through an incremental process of failure analysis and lexicon refinement we significantly reduce the number of spurious failures. We then apply the agreement checker directly to machine translation by incorporating it as a feature function of the log-linear model. We train our baseline system on the larger Europarl corpus and again measure failure rates before applying the agreement check as both a hard and soft constraint. The effects on translation are not large enough to reliably measure using standard automatic evaluation techniques and so we perform a manual analysis of the types of change introduced.
机译:传统的统计机器翻译(SMT)模型无法很好地解决许多语言现象,例如主语-动词一致性和语言之间的字序差异。最近的工作,例如基于因式短语的模型中的工作,已经显示出通过使用语言丰富的模型可以改善翻译质量。基于统一的语法方法为协议建模提供了一个框架,这是生成形态丰富的语言中的一个特殊问题,因此,为了评估从应用到SMT可获得的潜在收益,我们首先考虑如何自动识别和衡量协议失败。我们重点关注德语名词短语中的词尾变化的特定问题,并提出一种简单的基于单词化的方法来解决该问题。我们基于此方法开发了协议检查器,并使用它来评估在小型新闻评论语料库上训练的基于层次短语的翻译系统的协议失败率。最初,我们发现我们的检查器在有效的训练数据上报告了不合理的高故障率,并且通过逐步的故障分析和词典修正,我们显着减少了虚假故障的数量。然后,我们将协议检查器作为对数线性模型的特征函数,将其直接应用于机器翻译。我们在较大的Europarl语料库上训练我们的基准系统,并再次在将协议检查作为硬约束和软约束应用之前测量失败率。对翻译的影响不够大,无法使用标准的自动评估技术可靠地进行衡量,因此我们对引入的更改类型进行了手动分析。

著录项

  • 作者

    Williams Philip;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号