首页> 外文学位 >Tree-adjoining machine translation.
【24h】

Tree-adjoining machine translation.

机译:树连接机器翻译。

获取原文
获取原文并翻译 | 示例

摘要

Machine Translation (MT) is the task of translating a document from a source language (e.g., Chinese) into a target language (e.g., English) via computer. State-of-the-art statistical approaches to MT use large collections of human-translated documents as training material, gathering statistics on the patterns of correspondence between languages according to the features specified by the translation model. Using this bilingual translation model in conjunction with a target language model, created by gathering statistics from a large monolingual corpus, a new document in the source language can be automatically translated into its target-language equivalent with surprising accuracy.;Much MT research focuses on types of the patterns and features to include in a translation model. Recent statistical MT models have used syntax trees to enforce grammaticality, but the currently popular tree substitution models only memorize sequences of words or constituents, specifying exactly what phrases to use and exactly what trees are grammatical, which does not generalize well. Adding the operation of tree-adjoining provides the freedom to splice additional information into an existing grammatical tree. An adjoining translation model allows general, linguistically-motivated translation patterns to be learned without the clutter of endless variations of optional material. The appropriate modifiers, such as adjectives, adverbs, and prepositional phrases, can be grafted into these core patterns as needed to translate details. We show that the increased generalization power provided by adjoining, when used carefully, improves MT quality without becoming computationally intractable.;In this thesis, we describe challenges encountered by both word-sequence-based and syntax-tree-based MT systems today, and present an in-depth, quantitative comparison of both models. Then we describe a novel model for statistical MT which addresses these challenges using a synchronous tree-adjoining grammar. We introduce a method of converting these grammars to a weakly equivalent tree transducer for decoding. Then we present a method for learning the rules and associated probabilities of this grammar from aligned tree/string training data, and empirically analyze important characteristics of the resulting model, considering and evaluating many variations. Finally, our results show that adjoining delivers a consistent improvement over a baseline statistical syntax-based MT model on both medium and large-scale MT tasks using several language pairs.
机译:机器翻译(MT)是通过计算机将文档从源语言(例如中文)转换为目标语言(例如英语)的任务。 MT的最新统计方法使用大量的人工翻译文档作为培训材料,根据翻译模型指定的功能收集语言之间对应模式的统计信息。通过将这种双语翻译模型与目标语言模型结合使用,该模型是通过从大型单语语料库中收集统计数据而创建的,源语言中的新文档可以自动以惊人的准确性翻译成其目标语言版本。翻译模型中包含的模式和特征类型。最近的统计MT模型已经使用语法树来增强语法,但是当前流行的树替换模型仅存储单词或成分的序列,确切指定要使用的短语以及语法是什么树,这不能很好地概括。添加邻接树的操作可以自由地将其他信息拼接到现有的语法树中。相邻的翻译模型允许学习通用的,基于语言的翻译模式,而不会造成可选材料无休止的变化。可以根据需要将适当的修饰语(例如形容词,副词和介词短语)移植到这些核心模式中,以翻译细节。我们显示出,通过谨慎使用,邻接所提供的增强泛化能力可以提高MT质量,而不会变得难以计算。。在本文中,我们描述了当今基于词序和基于语法树的MT系统所面临的挑战,以及目前对这两种模型进行了深入,定量的比较。然后,我们描述了一种用于统计MT的新颖模型,该模型使用同步树邻接语法解决了这些挑战。我们介绍了一种将这些语法转换为弱等效树换能器以进行解码的方法。然后,我们提出了一种从对齐的树/字符串训练数据中学习该语法的规则和相关概率的方法,并通过经验方法分析了所得模型的重要特征,并考虑和评估了许多变体。最后,我们的结果表明,在使用几种语言对的中型和大型MT任务上,邻接关系在基于基线统计语法的MT模型上提供了一致的改进。

著录项

  • 作者

    DeNeefe, Steve.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 171 p.
  • 总页数 171
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号