Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling

Mermer C.; Saraclar M.; Sarikaya R.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling

【24h】

Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling

机译：使用贝叶斯词对齐和Gibbs采样改善统计机器翻译

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present a Bayesian approach to word alignment inference in IBM Models 1 and 2. In the original approach, word translation probabilities (i.e., model parameters) are estimated using the expectation-maximization (EM) algorithm. In the proposed approach, they are random variables with a prior and are integrated out during inference. We use Gibbs sampling to infer the word alignment posteriors. The inferred word alignments are compared against EM and variational Bayes (VB) inference in terms of their end-to-end translation performance on several language pairs and types of corpora up to 15 million sentence pairs. We show that Bayesian inference outperforms both EM and VB in the majority of test cases. Further analysis reveals that the proposed method effectively addresses the high-fertility rare word problem in EM and unaligned rare word problem in VB, achieves higher agreement and vocabulary coverage rates than both, and leads to smaller phrase tables.

机译：我们在IBM模型1和2中提出了一种贝叶斯方法来进行单词对齐推理。在原始方法中，使用期望最大化（EM）算法估计单词翻译概率（即模型参数）。在所提出的方法中，它们是具有先验的随机变量，并在推理过程中被积分。我们使用Gibbs采样来推断单词对齐后验。根据它们在几种语言对和多达1500万个句子对的语料库类型上的端到端翻译性能，将推断出的单词对齐方式与EM和变异贝叶斯（VB）推理进行比较。我们表明，在大多数测试用例中，贝叶斯推理均优于EM和VB。进一步的分析表明，所提出的方法有效地解决了EM中的高生育率稀有词问题和VB中的未对齐稀有词问题，实现了更高的一致性和词汇覆盖率，并且导致短语表更小。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2013年第5期|p.1090-1101|共12页
作者
Mermer C.; Saraclar M.; Sarikaya R.;
展开▼
作者单位

TÜBİTAK BİLGEM, Kocaeli, Turkey;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bayesian methods; Computational modeling; Hidden Markov models; Inference algorithms; Random variables; Speech; Speech processing; Bayesian methods; Gibbs sampling; statistical machine translation (SMT); word alignment;

机译：贝叶斯方法;计算建模;隐藏的马尔可夫模型;推理算法;随机变量;言语;语音处理;贝叶斯方法;吉布斯采样;统计机器翻译（SMT）;单词对齐;

相似文献

外文文献
中文文献
专利

1. Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation [J] . Zezhong LI, Hideto IKEDA, Junichi FUKUMOTO IEICE transactions on information and systems . 2013,第7期

机译：统计机器翻译的贝叶斯单词对齐和短语表训练
2. Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation [J] . Zezhong Li, Hideto Ikeda, Junichi Fukumoto IEICE Transactions on Information and Systems . 2013,第7期

机译：统计机器翻译的贝叶斯单词对齐和短语表训练
3. Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment [J] . Marine Carpuat, Yuval Marton, Nizar Habash Machine translation . 2012,第1a2期

机译：通过重新排列词后对齐主题以进行单词对齐，改进了阿拉伯语到英语的统计机器翻译
4. Improving Word Alignment for Statistical Machine Translation Based on Constraints [C] . Quang-Hung Le, Anh-Cuong Le International Conference on Asian Language Processing . 2012

机译：基于约束的统计机器翻译中单词对齐的改进
5. Improved word alignments for statistical machine translation. [D] . Fraser, Alexander. 2007

机译：改进了单词对齐，以进行统计机器翻译。
6. Bayesian Estimation of the DINA Model With Pólya-Gamma Gibbs Sampling [O] . Zhaoyuan Zhang, Jiwei Zhang, Jing Lu, 2020

机译：基于Pólya-GammaGibbs采样的DINA模型的贝叶斯估计
7. Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment [O] . Carpuat, Marine, Marton, Yuval, Habash, Nizar 2012

机译：通过重新排列词后对齐主题以进行单词对齐，改进了阿拉伯语到英语的统计机器翻译

Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅