首页> 外文会议>Conference on empirical methods in natural language processing;Conference on computational natural language learning >Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models
【24h】

Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models

机译:基于翻译模型的跨语言模型适应:从单词模型到短语模型

获取原文

摘要

In this paper, we propose a novel translation model (TM) based cross-lingual data selection model for language model (LM) adaptation in statistical machine translation (SMT), from word models to phrase models. Given a source sentence in the translation task, this model directly estimates the probability that a sentence in the target LM training corpus is similar. Compared with the traditional approaches which utilize the first pass translation hypotheses, cross-lingual data selection model avoids the problem of noisy proliferation. Furthermore, phrase TM based cross-lingual data selection model is more effective than the traditional approaches based on bag-of-words models and word-based TM, because it captures contextual information in modeling the selection of phrase as a whole. Experiments conducted on large-scale data set-s demonstrate that our approach significantly outperforms the state-of-the-art approaches on both LM perplexity and SMT performance.
机译:在本文中,我们提出了一种新颖的基于翻译模型(TM)的跨语言数据选择模型,用于从单词模型到短语模型的统计机器翻译(SMT)中的语言模型(LM)适应。给定翻译任务中的源句子,此模型直接估计目标LM训练语料库中的句子相似的概率。与采用首过翻译假设的传统方法相比,跨语言数据选择模型避免了噪声扩散的问题。此外,基于短语TM的跨语言数据选择模型比基于词袋模型和基于单词的TM的传统方法更有效,因为它在对短语选择进行整体建模时捕获了上下文信息。在大规模数据集上进行的实验表明,我们的方法在LM复杂度和SMT性能方面均明显优于最新方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号