首页> 外文期刊>Machine translation >Chunk-lattices for verb reordering in Arabic-English statistical machine translation Special issues on machine translation for Arabic
【24h】

Chunk-lattices for verb reordering in Arabic-English statistical machine translation Special issues on machine translation for Arabic

机译:阿拉伯语-英语统计机器翻译中用于动词重新排序的块格

获取原文
获取原文并翻译 | 示例
       

摘要

Syntactic disfluencies in Arabic-to-English phrase-based SMT output are often due to incorrect verb reordering in Verb-Subject-Object sentences. As a solution, we propose a chunk-based reordering technique to automatically displace clauseinitial verbs in the Arabic side of a word-aligned parallel corpus. This method is used to preprocess the training data, and to collect statistics about verb movements. From this analysis we build specific verb reordering lattices on the test sentences before decoding, and test different lattice-weighting schemes. Finally, we train a feature-rich discriminative model to predict likely verb reorderings for a given Arabic sentence. The model scores are used to prune the reordering lattice, leading to better word reordering at decoding time. The application of our reordering methods to the training and test data results in consistent improvements on the NIST-MT 2009 Arabic-English benchmark, both in terms of BLEU (+1.06%) and of reordering quality (+0.85%) measured with the Kendall Reordering Score.
机译:阿拉伯语到英语基于短语的SMT输出中的语法差异通常是由于动词-主语-宾语句子中的动词重新排序不正确所致。作为解决方案,我们提出了一种基于块的重新排序技术,以自动将单词初始动词替换为单词对齐的并行语料库的阿拉伯语一侧。该方法用于预处理训练数据,并收集有关动词运动的统计信息。通过此分析,我们在解码之前在测试句子上构建了特定的动词重排序晶格,并测试了不同的晶格加权方案。最后,我们训练了一个功能丰富的判别模型,以预测给定阿拉伯语句子可能的动词重排。模型得分用于修剪重排序晶格,从而在解码时实现更好的单词重排序。将我们的重新排序方法应用于培训和测试数据,使NIST-MT 2009阿拉伯语-英语基准得到了持续改进,无论是BLEU(+ 1.06%)还是Kendall测量的重新排序质量(+ 0.85%)重新排序分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号