首页> 外文学位 >Dependency Structures for Statistical Machine Translation.
【24h】

Dependency Structures for Statistical Machine Translation.

机译:统计机器翻译的依存结构。

获取原文
获取原文并翻译 | 示例

摘要

Dependency structures represent a sentence as a set of dependency relations. Normally the dependency structures from a tree connect all the words in a sentence. One of the most defining characters of dependency structures is the ability to bring long distance dependency between words to local dependency structures. Another the main attraction of dependency structures has been its close correspondence to meaning. This thesis focuses on integrating dependency structures into machine translation components including decoder algorithm, reordering models, confidence measure, and sentence simplification.;First, we develop four novel cohesive soft constraints for a phrase-based decoder namely exhaustive interruption check, interruption count, exhaustive interruption count, and rich interruption constraints. To ensure the robustness and effectiveness of the proposed constraints, we conduct experiments on four different language pairs, including English-{Iraqi, Spanish} and {Arabic, Chinese}-English. The improvements are in between 0.4 and 1.8 BLEU points. These experiments also cover a wide range of training corpus sizes, ranging from 500K sentence pairs up to 10 million sentence pairs. Furthermore, to show the effectiveness of our proposed methods we apply them to systems using a 2.7 billion words 5-gram LM, different reordering models and dependency parsers.;Second, to go beyond cohesive soft constraints, we investigate efficient algorithms for learning and decoding with source-side dependency tree reordering models. We propose a novel source-tree reordering model that exploits dependency subtree inside / outside movements and cohesive soft constraints. These movements and constraints enable us to efficiently capture the subtree-to-subtree transitions observed both in the source of word-aligned training data and in the decoding time. Representing subtree movements as features allows MERT to train the corresponding weights for these features relative to others in the model. Moreover, experimental results on English-{Iraqi, Spanish} show that we obtain improvements +0.8 BLEU and -1.4 TER on English-Spanish and +0.8 BLEU and -2.3 TER on English-Iraqi.;Third, we develop Goodness, a novel framework to predict word and sentence level of machine translation confidence with dependency structures. The framework allows MT systems to inform users which words are likely translated correctly and how confident it is about the whole sentence. Experimental results show that the MT error prediction accuracy is increased from 69.1 to 72.2 in F-score. The Pearson correlation between the proposed confidence measure and the human-targeted translation edit rate (HTER) is 0.6. Improvements between 0.4 and 0.9 TER reduction are obtained with the n-best list reranking task using the proposed confidence measure. Also, we present a visualization prototype of MT errors at the word and sentence levels with the objective to improve post-editor productivity.;Finally, inspired by study in summarization we propose TriS, a novel framework to simplify source sentences before translating them. We build a statistical sentence simplification system with log-linear models. In contrast to state-of-the-art methods that drive sentence simplification process by hand-written linguistic rules, our method used a margin-based discriminative learning algorithm operates on a feature set. The feature set is defined on statistics of dependency structures as well as surface form and syntactic structures of sentences. A stack decoding algorithm is developed in order to efficiently generate and search simplification hypotheses. Experimental results show that the simplified text produced by the proposed system reduces 1.7 Flesch-Kincaid grade level when compared with the original text. We show that a comparison of a state-of-the-art rule-based system to the proposed system demonstrates an improvement of 0.2, 0.6, and 4.5 points in ROUGE-2, ROUGE-4, and AveF 10, respectively. We present subjective evaluations of the simplified translation quality for an English-German MT system.
机译:依存关系结构将句子表示为一组依存关系。通常,来自树的依存关系结构连接句子中的所有单词。依赖性结构最定义的特征之一是能够将单词之间的长距离依赖性引入本地依赖性结构。依赖结构的另一个主要吸引力是其与含义的紧密对应。本文着重于将依赖结构集成到机器翻译组件中,包括解码器算法,重排序模型,置信度和句子简化。首先,我们为基于短语的解码器开发了四个新颖的​​内聚软约束,即穷举中断检查,中断计数,穷举中断计数,以及丰富的中断约束。为了确保所提出约束的鲁棒性和有效性,我们对四种不同的语言对进行了实验,包括英语-{伊拉克,西班牙语}和{阿拉伯语,中文}-英语。改进幅度在0.4到1.8 BLEU之间。这些实验还涵盖了广泛的训练语料库大小,范围从50万个句子对到1000万个句子对。此外,为了展示我们提出的方法的有效性,我们将其应用于使用27亿字的5克LM,不同的重排序模型和依赖性解析器的系统。其次,为了超越内聚的软约束,我们研究了有效的学习和解码算法与源端依赖性树重新排序模型。我们提出了一种新颖的源树重新排序模型,该模型利用了内部/外部运动和内聚软约束的依赖子树。这些运动和约束条件使我们能够有效地捕获在字对齐训练数据的源中和解码时间中观察到的子树到子树的过渡。将子树运动表示为特征允许MERT相对于模型中的其他特征训练这些特征的相应权重。此外,在英语-{伊拉克,西班牙语}上的实验结果表明,我们在英语-西班牙语上获得了+0.8 BLEU和-1.4 TER,在英语-伊拉克上获得了+0.8 BLEU和-2.3 TER。第三,我们开发了一种新颖的善良框架来预测具有依存关系结构的机器翻译信心的单词和句子水平该框架允许MT系统通知用户哪些单词可能正确翻译,以及对整个句子的信心。实验结果表明,F分数的MT错误预测精度从69.1提高到72.2。建议的置信度度量与以人为目标的翻译编辑率(HTER)之间的Pearson相关性为0.6。使用拟议的置信度度量,使用n-最佳列表重新排序任务可获得0.4-0.9 TER减少的改进。此外,我们提出了一个在单词和句子级别上MT错误的可视化原型,目的是提高编辑后的生产率。最后,受总结研究的启发,我们提出了TriS,这是一种新颖的框架,可以在翻译源句子之前简化它们。我们使用对数线性模型构建统计句子简化系统。与通过手写语言规则驱动句子简化过程的最新方法相反,我们的方法使用了基于特征的基于余量的判别学习算法。该特征集是根据相关结构的统计以及句子的表面形式和句法结构定义的。为了有效地生成和搜索简化假设,开发了堆栈解码算法。实验结果表明,与原始文本相比,该系统所产生的简化文本降低了1.7 Flesch-Kincaid等级等级。我们显示,将最新的基于规则的系统与所提出的系统进行比较,结果表明ROUGE-2,ROUGE-4和AveF 10分别提高了0.2、0.6和4.5点。我们对英语-德语MT系统的简化翻译质量进行了主观评估。

著录项

  • 作者

    Bach, Nguyen.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Language Linguistics.;Information Technology.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 141 p.
  • 总页数 141
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:42:31

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号