首页> 外文期刊>Machine translation >Hybrid data-driven models of machine translation

Hybrid data-driven models of machine translation


获取原文并翻译 | 示例


This paper presents an extended, harmonised account of our previous work on combining subsentential alignments from phrase-based statistical machine translation (SMT) and example-based MT (EBMT) systems to create novel hybrid data-driven systems capable of outperforming the baseline SMT and EBMT systems from which they were derived. In previous work, we demonstrated that while an EBMT system is capable of outperforming a phrase-based SMT (PBSMT) system constructed from freely available resources, a hybrid 'example-based' SMT system incorporating marker chunks and SMT subsentential alignments is capable of outperforming both baseline translation models for French-English translation. In this paper, we show that similar gains are to be had from constructing a hybrid 'statistical' EBMT system. Unlike the previous research, here we use the Europarl training and test sets, which are fast becoming the standard data in the field. On these data sets, while all hybrid 'statistical' EBMT variants still fall short of the quality achieved by the baseline PBSMT system, we show that adding the marker chunks to create a hybrid 'example-based' SMT system outperforms the two baseline systems from which it is derived. Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target-language model to the EBMT system, and demonstrate that this too has a positive effect on translation quality. We also show that many of the subsentential alignments derived from the Europarl corpus are created by either the PBSMT or the EBMT system, but not by both. In sum, therefore, despite the obvious convergence of the two paradigms, the crucial differences between SMT and EBMT contribute positively to the overall translation quality. The central thesis of this paper is that any researcher who continues to develop an MT system using either of these approaches will benefit further from integrating the advantages of the other model; dogged adherence to one approach will lead to inferior systems being developed.
机译:本文介绍了我们以前的工作的扩展,协调的说明,该工作结合了基于短语的统计机器翻译(SMT)和基于示例的MT(EBMT)系统的实质对齐方式,以创建能够胜过基线SMT和源自它们的EBMT系统。在先前的工作中,我们证明了EBMT系统能够胜过由免费资源构成的基于短语的SMT(PBSMT)系统,而结合了标记块和SMT实质对齐方式的混合“基于示例”的SMT系统却能胜过法语-英语翻译的两个基准翻译模型。在本文中,我们表明构建混合的“统计” EBMT系统将获得类似的收益。与以前的研究不同,这里我们使用Europarl训练和测试集,它们已迅速成为该领域的标准数据。在这些数据集上,尽管所有混合的“统计” EBMT变体仍未达到基线PBSMT系统所达到的质量,但我们显示,添加标记块以创建混合的“基于示例”的SMT系统要优于两个基线系统它是派生的。此外,通过向EBMT系统添加SMT目标语言模型,我们为混合系统提供了进一步的证据,并证明这也对翻译质量产生了积极影响。我们还显示,许多源自Europarl语料库的实体比对都是由PBSMT或EBMT系统创建的,而不是由两者创建的。因此,总而言之,尽管这两种范例明显融合,但SMT和EBMT之间的关键差异对整体翻译质量有积极的贡献。本文的中心论点是,任何继续使用这些方法之一开发MT系统的研究人员都将从集成其他模型的优点中进一步受益。严格遵守一种方法将导致开发劣等系统。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号