首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling
【24h】

Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling

机译:使用贝叶斯词对齐和Gibbs采样改善统计机器翻译

获取原文
获取原文并翻译 | 示例
           

摘要

We present a Bayesian approach to word alignment inference in IBM Models 1 and 2. In the original approach, word translation probabilities (i.e., model parameters) are estimated using the expectation-maximization (EM) algorithm. In the proposed approach, they are random variables with a prior and are integrated out during inference. We use Gibbs sampling to infer the word alignment posteriors. The inferred word alignments are compared against EM and variational Bayes (VB) inference in terms of their end-to-end translation performance on several language pairs and types of corpora up to 15 million sentence pairs. We show that Bayesian inference outperforms both EM and VB in the majority of test cases. Further analysis reveals that the proposed method effectively addresses the high-fertility rare word problem in EM and unaligned rare word problem in VB, achieves higher agreement and vocabulary coverage rates than both, and leads to smaller phrase tables.
机译:我们在IBM模型1和2中提出了一种贝叶斯方法来进行单词对齐推理。在原始方法中,使用期望最大化(EM)算法估计单词翻译概率(即模型参数)。在所提出的方法中,它们是具有先验的随机变量,并在推理过程中被积分。我们使用Gibbs采样来推断单词对齐后验。根据它们在几种语言对和多达1500万个句子对的语料库类型上的端到端翻译性能,将推断出的单词对齐方式与EM和变异贝叶斯(VB)推理进行比较。我们表明,在大多数测试用例中,贝叶斯推理均优于EM和VB。进一步的分析表明,所提出的方法有效地解决了EM中的高生育率稀有词问题和VB中的未对齐稀有词问题,实现了更高的一致性和词汇覆盖率,并且导致短语表更小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号