首页> 外文OA文献 >Refinements in hierarchical phrase-based translation systems
【2h】

Refinements in hierarchical phrase-based translation systems

机译:基于分层短语的翻译系统中的改进

摘要

The relatively recently proposed hierarchical phrase-based translation modelfor statistical machine translation (SMT) has achieved state-of-the-art performancein numerous recent translation evaluations. Hierarchical phrase-basedsystems comprise a pipeline of modules with complex interactions. Inthis thesis, we propose refinements to the hierarchical phrase-based modelas well as improvements and analyses in various modules for hierarchicalphrase-based systems.We took the opportunity of increasing amounts of available training datafor machine translation as well as existing frameworks for distributed computingin order to build better infrastructure for extraction, estimation andretrieval of hierarchical phrase-based grammars. We design and implementgrammar extraction as a series of Hadoop MapReduce jobs. We store the resultinggrammar using the HFile format, which offers competitive trade-offsin terms of efficiency and simplicity. We demonstrate improvements over twoalternative solutions used in machine translation.The modular nature of the SMT pipeline, while allowing individual improvements,has the disadvantage that errors committed by one module arepropagated to the next. This thesis alleviates this issue between the wordalignment module and the grammar extraction and estimation module byconsidering richer statistics from word alignment models in extraction. Weuse alignment link and alignment phrase pair posterior probabilities for grammarextraction and estimation and demonstrate translation improvements inChinese to English translation.This thesis also proposes refinements in grammar and language modellingboth in the context of domain adaptation and in the context of the interactionbetween first-pass decoding and lattice rescoring. We analyse alternativestrategies for grammar and language model cross-domain adaptation. Wealso study interactions between first-pass and second-pass language model in terms of size and n-gram order. Finally, we analyse two smoothing methodsfor large 5-gram language model rescoring.The last two chapters are devoted to the application of phrase-basedgrammars to the string regeneration task, which we consider as a means tostudy the fluency of machine translation output. We design and implement amonolingual phrase-based decoder for string regeneration and achieve state-of-the-artperformance on this task. By applying our decoder to the outputof a hierarchical phrase-based translation system, we are able to recover thesame level of translation quality as the translation system.
机译:相对较新提出的用于统计机器翻译(SMT)的基于层次短语的分层翻译模型在众多近期翻译评估中均达到了最先进的性能。基于分层短语的系统包括具有复杂交互作用的模块流水线。本文对基于短语的分层模型进行了改进,并对基于短语的系统在各个模块中进行了改进和分析。我们抓住了机会,为机器翻译以及现有的分布式计算框架增加了可用的培训数据,建立更好的基础结构,用于基于短语的分层语法的提取,估计和检索。我们将语法提取设计和实现为一系列Hadoop MapReduce作业。我们使用HFile格式存储结果语法,该格式在效率和简单性方面提供了有竞争力的权衡。我们演示了对机器翻译中使用的两种替代解决方案的改进。SMT管道的模块化性质允许单独进行改进,但缺点是一个模块所犯的错误会传播到下一个模块。本文通过考虑提取中单词对齐模型的丰富统计信息,缓解了单词对齐模块与语法提取和估计模块之间的这种问题。我们利用对齐链接和对齐短语对的后验概率进行语法提取和估计,并证明汉译英的翻译改进。晶格计分。我们分析语法和语言模型跨域适应的替代策略。我们还根据大小和n-gram顺序研究了第一遍和第二遍语言模型之间的交互。最后,我们分析了5种大5 gram语言模型记录的平滑方法。最后两章致力于基于短语的语法在字符串再生任务中的应用,我们认为这是研究机器翻译输出的流畅性的一种方法。我们设计并实现了基于单语短语的解码器,用于字符串再生,并在此任务上实现了最新的性能。通过将我们的解码器应用于基于层次短语的翻译系统的输出,我们能够恢复与翻译系统相同水平的翻译质量。

著录项

  • 作者

    Pino Juan Miguel;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号