首页> 外文会议>International Conference on Computational Linguistics and Intelligent Text Processing >Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation
【24h】

Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation

机译:利用并行树木银行来改进基于短语的统计机器翻译

获取原文

摘要

Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PBSMT) system leads to significant improvements in translation quality. We describe further experiments on incorporating parallel treebank information into PBSMT, such as word alignments. We investigate the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the potential of parallel treebanks in other paradigms of MT.
机译:鉴于最近的讨论和领域的焦点转变,显而易见的是,语法的融合是对当前机器翻译(MT)中最先进的前进的方式。平行树木银行是一个相对较近的创新,似乎是MT培训材料的理想候选人。但是,直到最近,没有其他意味着构建它们而不是手工。在本文中,我们描述了我们如何利用新工具来自动构建大型并行树木库并从中提取一组语言上动机的短语对。我们表明将这些短语对添加到基于基于基准短语的统计MT(PBSMT)系统的翻译模型,导致翻译质量的显着改进。我们描述了将并行树木银行信息纳入PBSMT的进一步实验,例如词对齐。我们研究了并行树木数据的纳入最佳地执行的条件。最后,我们讨论了MT的其他范式的平行树木班克的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号