首页> 外文期刊>Machine translation >Factored bilingual n-gram language models for statistical machine translation
【24h】

Factored bilingual n-gram language models for statistical machine translation

机译:统计机器翻译的因子双语n-gram语言模型

获取原文
获取原文并翻译 | 示例
       

摘要

In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrase-based or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the n-gram-based approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different back-off techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.
机译:在这项工作中,我们提出了基于因式语言模型(FLM)的基于n元语法的翻译模型的扩展。在基于n元语法的统计机器翻译(SMT)方法中采用的翻译单位基于原始单词序列的映射,而翻译模型的概率则通过此类双语单位的标准语言建模来估算。因此,类似于其他翻译模型方法(基于短语或分层的方法),即使在有大量双语语料库可用的条件下,被建模单元的稀疏性问题也会导致不可靠的概率估计。为了解决这个问题,我们通过紧密集成更广泛的词表示形式(例如词元和词法类),将基于n元语法的方法扩展到SMT,并且我们使用FLM的灵活框架来应用许多不同的退避技术。在这项工作中,我们证明FLM也可以成功地应用于翻译建模,从而产生更可靠的概率估计,在翻译过程中将更大的双语上下文整合在一起。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号