Lexical features for statistical machine translation.

机译：统计机器翻译的词汇功能。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In modern phrasal and hierarchical statistical machine translation systems, two major features model translation: rule translation probabilities and lexical smoothing scores. The rule translation probabilities are computed as maximum likelihood estimates (MLEs) of an entire source (or target) phrase translating to a target (or source) phrase. The lexical smoothing scores are also a likelihood estimate of a source (target) phrase translating to a target (source) phrase, but they are computed using independent word-to-word translation probabilities. Intuitively, it would seem that the lexical smoothing score is a less powerful estimate of translation likelihood due to this independence assumption, but I present the somewhat surprising result that lexical smoothing is far more important to the quality of a state-of-the-art hierarchical SMT system than rule translation probabilities. I posit that this is due to a fundamental data sparsity problem: The average word-to-word translation is seen many more times than the average phrase-to-phrase translation, so the word-to-word translation probabilities (or lexical probabilities) are far better estimated.;Motivated by this result, I present a number of novel methods for modifying the lexical probabilities to improve the quality of our MT output. First, I examine two methods of lexical probability biasing, where for each test document, a set of secondary lexical probabilities are extracted and interpolated with the primary lexical probability distribution. Biasing each document with the probabilities extracted from its own first-pass decoding output provides a small but consistent gain of about 0.4 BLEU.;Second, I contextualize the lexical probabilities by factoring in additional information such as the previous or next word. The key to the success of this context-dependent lexical smoothing is a backoff model, where our "trust" of a context-dependent probability estimation is directly proportional to how many times it was seen in the training. In this way, I avoid the estimation problem seen in translation rules, where the amount of context is high but the probability estimation is inaccurate. When using the surrounding words as context, this feature provides a gain of about 0.6 BLEU on Arabic and Chinese.;Finally, I describe several types of discriminatively trained lexical features, along with a new optimization procedure called Expected-BLEU optimization. This new optimization procedure is able to robustly estimate weights for thousands of decoding features, which can in effect discriminatively optimize a set of lexical probabilities to maximize BLEU. I also describe two other discriminative feature types, one of which is the part-of-speech analogue to lexical probabilities, and the other of which estimates training corpus weights based on lexical translations. The discriminative features produce a gain of 0.8 BLEU on Arabic and 0.4 BLEU on Chinese.

机译：在现代短语和分级统计机器翻译系统中，模型翻译有两个主要功能：规则翻译概率和词汇平滑分数。规则转换概率被计算为翻译成目标（或源）短语的整个源（或目标）短语的最大似然估计（MLE）。词汇平滑分数也是翻译成目标（源）短语的源（目标）短语的似然估计，但是它们是使用独立的词对词翻译概率来计算的。凭直觉来看，由于这种独立性假设，词汇平滑分数似乎对翻译可能性的影响较小，但我提出了令人惊讶的结果，即词汇平滑对最新技术的质量更为重要分级SMT系统比规则转换概率大。我认为这是由于基本的数据稀疏性问题造成的：平均单词到单词的翻译比平均短语到短语的翻译要多得多，因此单词到单词的翻译概率（或词汇概率）受此结果的启发，我提出了许多新颖的方法来修改词法概率，以提高MT输出的质量。首先，我研究了两种词汇概率偏向方法，其中对于每个测试文档，提取一组次要词汇概率，并用主要词汇概率分布进行内插。用从其自身的第一遍解码输出中提取的概率对每个文档进行偏置，可以得到约0.4个BLEU的小而一致的增益。其次，我通过考虑诸如上一个或下一个单词之类的附加信息来对词汇概率进行上下文化。这种依赖于上下文的词汇平滑成功的关键是一个退避模型，其中我们对上下文依赖的概率估计的“信任”与在训练中看到的次数成正比。这样，我避免了在翻译规则中看到的估计问题，在翻译规则中，上下文的数量很大，但概率估计却不准确。当使用周围的单词作为上下文时，此功能为阿拉伯语和中文提供约0.6 BLEU的增益。最后，我描述了几种类型的经过判别训练的词法功能，以及一个称为Expected-BLEU优化的新优化过程。这种新的优化过程能够针对数千种解码功能稳健地估计权重，从而可以有区别地优化一组词汇概率以最大化BLEU。我还描述了其他两种区分特征类型，其中一种是词法概率的词性类似物，另一种是根据词法翻译估计训练语料的权重。区别特征使阿拉伯语的收益增加了0.8 BLEU，中文增加了0.4 BLEU。

著录项

作者
Devlin, Jacob.;
展开▼
作者单位

University of Maryland, College Park.;

展开▼
授予单位 University of Maryland, College Park.;
学科 Computer Science.
学位 M.S.
年度 2009
页码 91 p.
总页数 91
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A deep source-context feature for lexical selection in statistical machine translation [J] . Gupta Parth, Costa-Jussa Marta R., Rosso Paolo, Pattern recognition letters . 2016,第maya1期

机译：统计机器翻译中用于词法选择的深层源上下文特征
2. Philipp Koehn: Statistical Machine Translation. [J] . Applied Linguistics . 2011,第3期

机译：Philipp Koehn：统计机器翻译。
3. Graph-based lexicalized reordering models for statistical machine translation [J] . Su Jinsong, Liu Yang, Liu Qun, Communications, China . 2014,第5期

机译：用于统计机器翻译的基于图的词汇化重排序模型
4. German Compounds and Statistical Machine Translation. Can they get along? [C] . Carla Parra Escartin, Stephan Peitz, Hermann Ney 10th Workshop on multiword expressions . 2014

机译：德语化合物和统计机器翻译。他们可以相处吗？
5. A lexical semantic study of Four-Character Sino-Japanese compounds and its application to machine translation. [D] . Kudo, Mayo. 2007

机译：中日四字复合词的语义研究及其在机器翻译中的应用。
6. Do statistical segmentation abilities predict lexical-phonological and lexical-semantic abilities in children with and without SLI? [O] . Elina Mainela-Arnold, Julia L. Evans -1

机译：统计分割能力是否预测有和没有SLI的儿童的词汇语音和词汇语义能力？
7. A Deep source-context feature for lexical selection in statistical machine translation [O] . Gupta, Parth, Ruiz Costa-Jussà, Marta, Rosso, Paolo, 2018

机译：用于统计机器翻译中词汇选择的深层源上下文功能

Lexical features for statistical machine translation.

摘要

著录项

相似文献

相关主题

期刊订阅