首页> 外文会议>Workshop on multiword expressions: from theory to application. >Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets
【24h】

Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets

机译:使用模拟混淆集进行机器翻译的无监督判别语言模型训练

获取原文
获取原文并翻译 | 示例

摘要

An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine trans-lation (MT). An English-to-English syn-chronous context-free grammar is derived from a baseline MT system to capture translation alternatives: pairs of words, phrases or other sentence fragments that potentially compete to be the translation of the same source-language fragment. Using this grammar, a set of impostor sentences is then created for each En-glish sentence to simulate confusions that would arise if the system were to process an (unavailable) input whose correct En-glish translation is that sentence. An LM is then trained to discriminate between the original sentences and the impostors. The procedure is applied to the IWSLT Chinese-to-English translation task, and promising improvements on a state-of-the- art MT system are demonstrated.
机译:提出了一种无监督的判别训练程序,用于估计机器翻译(MT)的语言模型(LM)。从基线MT系统派生英语到英语的同步上下文无关语法来捕获翻译替代方案:成对的单词,短语或其他句子片段对可能会竞争同一源语言片段的翻译。然后,使用该语法为每个En-glish句子创建一组冒名顶替者句子,以模拟如果系统要处理其正确的En-glish翻译是该句子的(不可用)输入而引起的混乱。然后训练LM,以区别原始句子和冒名顶替者。该程序应用于IWSLT的中英文翻译任务,并展示了对最先进的MT系统的有希望的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号