首页> 外文学位 >Preference Grammars and Decoding Algorithms for Probabilistic Synchronous Context Free Grammar Based Translation.
【24h】

Preference Grammars and Decoding Algorithms for Probabilistic Synchronous Context Free Grammar Based Translation.

机译:基于概率同步上下文免费语法的翻译的首选语法和解码算法。

获取原文
获取原文并翻译 | 示例

摘要

Probabilistic Synchronous Context-free Grammars (PSCFGs) [Aho and Ullmann, 1969, Wu, 1996] define weighted transduction rules to represent translation and reordering operations. When translation models use features that are defined locally, on each rule, there are efficient dynamic programming algorithms to perform translation with these grammars [Kasami, 1965]. In general, the integration of non-local features into the translation model can make translation NP-hard, requiring decoding approximations that limit the impact of these features.;In this thesis, we consider the impact and interaction between two non-local features, the n-gram language model (LM) and labels on rule nonterminal symbols in the Syntax-Augmented MT (SAMT) grammar [Zollmann and Venugopal, 2006]. While these features do not result in NP-hard search, they would lead to serious increases in wall-clock runtime if naive dynamic programming methods are applied.;We develop novel two-pass algorithms that make strong decoding approximations during a first pass search, generating a hypergraph of sentence spanning translation derivations. In a second pass, we use knowledge about non-local features to explore the hypergraph for alternative, potentially better translations. We use this approach to integrate the n-gram LM decoding feature as well as a non-local syntactic feature described below.;We then perform a systematic comparison of approaches to evaluate the relative impact of PSCFG methods over a strong phrase-based MT baseline with a focus on the impact of n-gram LM and syntactic labels. This comparison addresses important questions about the effectiveness of PSCFG methods for a variety of language and resource conditions. We learn that for language pairs that exhibit long distance reordering, PSCFG methods deliver improvements over comparable phrase-based systems and that SAMT labels result in additional small, but consistent improvements even in conjunction with strong n-gram LMs.;Finally, we propose a novel approach to use nonterminal labels in PSCFG decoding by extending the PSCFG formalism to represent hard label constraints as soft preferences. These preferences are used to compute a new decoding feature that reflects the probability that a derivation is syntactically well formed. This feature mitigates the effect of the commonly applied maximum a posteriori (MAP) approximation and can be discriminatively trained in concert with other model features. We report modest improvements in translation quality on a Chinese-to-English translation task.
机译:概率同步上下文无关文法(PSCFG)[Aho和Ullmann,1969; Wu,1996]定义了加权转导规则,以表示翻译和重新排序操作。当翻译模型使用在每个规则上本地定义的特征时,就会有高效的动态编程算法来使用这些语法进行翻译[Kasami,1965]。通常,将非本地特征集成到翻译模型中会使翻译NP困难,需要解码近似来限制这些特征的影响。在本文中,我们考虑了两个非本地特征之间的影响和相互作用,语法增强型MT(SAMT)语法中的规则语法非终结符上的n-gram语言模型(LM)和标签[Zollmann and Venugopal,2006]。虽然这些功能不会导致进行NP硬搜索,但如果应用朴素的动态编程方法,则会导致壁钟运行时间严重增加。;我们开发了新颖的两遍算法,可在首次遍历搜索过程中实现强大的解码近似度,生成跨翻译派生的句子超图。在第二遍中,我们使用有关非本地特征的知识来探索超图,以寻求替代的,可能更好的翻译。我们使用这种方法来集成n-gram LM解码功能以及下面描述的非本地语法功能;然后我们对方法进行系统比较,以评估PSCFG方法在基于短语的MT基准上的相对影响重点放在n-gram LM和句法标签的影响上。该比较解决了有关PSCFG方法在多种语言和资源条件下的有效性的重要问题。我们了解到,对于表现出长距离重排序的语言对,PSCFG方法在可比的基于短语的系统上提供了改进,并且SAMT标签甚至在与强大的n-gram LM结合时也可以带来其他小的但一致的改进。通过扩展PSCFG形式主义将硬标签约束表示为软偏好,在PSCFG解码中使用非终端标签的新颖方法。这些首选项用于计算新的解码功能,该功能反映了语法在语法上正确形成派生的可能性。此功能减轻了通常应用的最大后验(MAP)逼近的影响,并且可以与其他模型功能一起进行区分训练。我们报告了汉英翻译任务在翻译质量方面的适度改进。

著录项

  • 作者

    Venugopal, Ashish.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 132 p.
  • 总页数 132
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号