首页> 外文OA文献 >Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study
【2h】

Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study

机译:基于短语的短语统计机器翻译模型:分析研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The goal of this dissertation is to determine the best way to train a statistical machine translation system. I first develop a state-of-the-art machine translation system called Phrasal and then use it to examine a wide variety of potential learning algorithms and optimization criteria and arrive at two very surprising results. First, despite the strong intuitive appeal of more recent evaluation metrics, training to these metrics is no better than the older traditional approach of training to BLEU. Second, the most widely used learning algorithm for training machine translation systems, called minimum error rate training (MERT), works no better than standard machine learning algorithms such as log-linear models. This result demonstrates that machine translation does not require using a special purpose learning algorithm, but rather can be approached in a manner similar to other natural language processing and machine learning tasks. These results have a number of important implications. Contrary to existing beliefs, work on improving machine translation evaluation metrics and then training to the improved metrics will not in itself result in improved translation systems. Even more significantly, the widespread usage of MERT has limited the sort of models that can be used for machine translation, as it does not scale well to large numbers of features. If it is not necessary to use MERT to train competitive systems, machine translation can be treated similarly to any other natural language processing task with models that include arbitrarily large feature sets.
机译:本文的目的是确定训练统计机器翻译系统的最佳方法。我首先开发了一种称为Phrasal的最先进的机器翻译系统,然后将其用于检查各种潜在的学习算法和优化标准,并得出两个非常令人惊讶的结果。首先,尽管最近的评估指标具有很强的直观吸引力,但针对这些指标的培训并不比针对BLEU的较旧的传统培训方法更好。其次,用于训练机器翻译系统的最广泛使用的学习算法,称为最小错误率训练(MERT),其效果不比标准机器学习算法(例如对数线性模型)好。该结果表明,机器翻译不需要使用特殊目的的学习算法,而是可以采用类似于其他自然语言处理和机器学习任务的方式进行翻译。这些结果具有许多重要意义。与现有信念相反,致力于改进机器翻译评估指标,然后对改进的指标进行培训本身并不会带来改进的翻译系统。更重要的是,MERT的广泛使用限制了可用于机器翻译的模型种类,因为它无法很好地扩展到大量功能。如果不需要使用MERT来训练竞争系统,则可以使用包含任意大特征集的模型将机器翻译与其他任何自然语言处理任务一样对待。

著录项

  • 作者

    Cer Daniel;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号