首页> 外文期刊>Machine translation >Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment
【24h】

Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment

机译:通过重新排列词后对齐主题以进行单词对齐,改进了阿拉伯语到英语的统计机器翻译

获取原文
获取原文并翻译 | 示例
       

摘要

We study challenges raised by the order of Arabic verbs and their subjects in statistical machine translation (SMT). We show that the boundaries of post-verbal subjects (VS) are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. In addition, VS constructions have highly ambiguous reordering patterns when translated to English, and these patterns are very different for matrix (main clause) VS and non-matrix (subordinate clause) VS. Based on this analysis, we propose a novel method for leveraging VS information in SMT: we reorder VS constructions into pre-verbal (SV) order for word alignment. Unlike previous approaches to sourceside reordering, phrase extraction and decoding are performed using the original Arabic word order. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline. Limiting reordering to matrix VS yields further improvements.
机译:我们研究了阿拉伯语动词及其主题在统计机器翻译(SMT)中的提出的挑战。我们证明,即使使用最先进的阿拉伯语依赖解析器,也很难准确地检测出语言后主题(VS)的边界。此外,VS构造在翻译成英文时具有高度模糊的重新排序模式,并且这些模式对于矩阵(主子句)VS和非矩阵(从属子句)VS有很大不同。基于此分析,我们提出了一种利用SMT中的VS信息的新颖方法:将VS结构重新排序为词对齐之前的(SV)顺序。与以前的源端重新排序方法不同,短语提取和解码是使用原始阿拉伯语单词顺序执行的。即使在强大的大规模基准上,该策略也可以显着提高BLEU和TER分数。将重新排序限制为矩阵VS可带来进一步的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号