首页> 外文会议>Computational linguistics and intelligent text processing >Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation
【24h】

Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation

机译:开发并行树库以改进基于短语的统计机器翻译

获取原文
获取原文并翻译 | 示例

摘要

Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PBSMT) system leads to significant improvements in translation quality. We describe further experiments on incorporating parallel treebank information into PBSMT, such as word alignments. We investigate the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the potential of parallel treebanks in other paradigms of MT.
机译:鉴于最近的讨论和该领域的重点转移,很明显,语法的合并是当前机器翻译(MT)的最新发展方向。并行树库是相对较新的创新,似乎是MT培训材料的理想选择。但是,直到最近,除了手工之外,没有其他方法可以构建它们。在本文中,我们描述了如何利用新工具自动构建大型并行树库,并从中提取一组语言动机的短语对。我们表明,将这些短语对添加到基于基线短语的统计MT(PBSMT)系统的翻译模型中会导致翻译质量的显着提高。我们描述了将并行树库信息合并到PBSMT中的进一步实验,例如单词对齐。我们研究了并行树库数据合并最佳执行的条件。最后,我们讨论了MT其他范式中并行树库的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号