首页> 外文期刊>Machine translation >Bilingual LSA-based adaptation for statistical machine translation
【24h】

Bilingual LSA-based adaptation for statistical machine translation

机译:基于双语LSA的统计机器翻译适应

获取原文
获取原文并翻译 | 示例
       

摘要

We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation (SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated on the Chinese-English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores. Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously, the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE SMT system.
机译:我们提出了一种新的方法,用于基于双语潜在语义分析的统计机器翻译(SMT)跨语言模型和翻译词典适应。双语LSA通过在培训期间执行一对一的主题对应关系,使潜在的主题分布可以有效地跨语言传输。使用提出的双语LSA框架,可以通过以下方法执行模型调整:首先,推断源文本的主题后验分布,然后通过边际适应将推断的分布应用于目标语言和翻译词典的n-gram语言模型。通过使用改编的翻译词典计算出的其他短语分数来增强背景短语表。提议的框架还具有基于另一种语言的源LSA模型的新语言LSA模型的快速引导功能。我们的方法是使用中型SMT系统和以BLEU和NIST分数衡量的GALE SMT系统在中英文MT06测试集上进行评估的。当分别应用改编的语言模型和改编的翻译词典时,在两个系统上都可以观察到两个分数的提高。当同时应用适应的语言模型和适应的翻译词典时,增益是可加的。在不适用的基准系统的95%置信区间内,使用中等规模的SMT系统在两个分数上的增加在统计上都是显着的,而使用GALE SMT系统的NIST分数在统计上是显着的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号