首页> 外文会议>9th Workshop on statistical machine translation >Abu-MaTran at WMT 2014 Translation Task: Two-step Data Selection and RBMT-Style Synthetic Rules
【24h】

Abu-MaTran at WMT 2014 Translation Task: Two-step Data Selection and RBMT-Style Synthetic Rules

机译:Abu-MaTran参加WMT 2014翻译任务:两步数据选择和RBMT风格的合成规则

获取原文
获取原文并翻译 | 示例

摘要

This paper presents the machine translation systems submitted by the Abu-MaTran project to the WMT 2014 translation task. The language pair concerned is English-French with a focus on French as the target language. The French to English translation direction is also considered, based on the word alignment computed in the other direction. Large language and translation models are built using all the datasets provided by the shared task organisers, as well as the monolingual data from LDC. To build the translation models, we apply a two-step data selection method based on bilingual cross-entropy difference and vocabulary saturation, considering each parallel corpus individually. Synthetic translation rules are extracted from the development sets and used to train another translation model. We then interpolate the translation models, minimising the perplexity on the development sets, to obtain our final SMT system. Our submission for the English to French translation task was ranked second amongst nine teams and a total of twenty submissions.
机译:本文介绍了Abu-MaTran项目提交给WMT 2014翻译任务的机器翻译系统。有关的语言对是英语-法语,重点是法语作为目标语言。根据在另一个方向上计算出的单词对齐方式,还会考虑法语到英语的翻译方向。大型语言和翻译模型是使用共享任务组织者提供的所有数据集以及LDC的单语数据构建的。为了建立翻译模型,我们基于双语的交叉熵差和词汇饱和度采用了两步数据选择方法,分别考虑了每个平行语料库。从开发集中提取合成翻译规则,并用于训练另一个翻译模型。然后,我们对翻译模型进行插值,以最大程度地减少开发集的困惑,从而获得最终的SMT系统。我们提交的英语到法语翻译任务在9个团队中排名第二,总共20个提交。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号