首页> 外文期刊>Machine translation >Automatically generated parallel treebanks and their exploitability in machine translation
【24h】

Automatically generated parallel treebanks and their exploitability in machine translation

机译:自动生成的并行树库及其在机器翻译中的可利用性

获取原文
获取原文并翻译 | 示例
       

摘要

Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for improvements to the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically-motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PB-SMT) system leads to significant improvements in translation quality. Following this, we describe experiments in which we exploit the information encoded in the parallel treebank in other areas of the PB-SMT framework, while investigating the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the possibility of exploiting automatically-generated parallel treebanks further in syntax-aware paradigms of MT.
机译:鉴于最近的大量讨论和该领域的重点转移,很明显,语法的合并是改进当前机器翻译(MT)最新技术的方法。并行树库是相对较新的创新,似乎是MT培训材料的理想选择。但是,直到最近,除了手工之外,没有其他方法可以构建它们。在本文中,我们描述了如何利用新工具自动构建大型并行树库,并从中提取一组语言动机的短语对。我们表明,将这些短语对添加到基于基线短语的统计MT(PB-SMT)系统的翻译模型中会导致翻译质量的显着提高。在此之后,我们描述了在PB-SMT框架的其他区域中利用并行树库中编码的信息的实验,同时研究了并行树库数据的合并最佳执行的条件。最后,我们讨论了在MT的语法感知范例中进一步利用自动生成的并行树库的可能性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号