首页> 外文会议>Annual meeting of the Association for Computational Linguistics;ACL 2012 >Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the e_0-norm
【24h】

Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the e_0-norm

机译:较小的比对模型以提供更好的翻译:具有e_0-norm的无监督词比对

获取原文

摘要

Two decades after their invention, the IBM word-based translation models, widely available in the GIZA++ toolkit, remain the dominant approach to word alignment and an integral part of many statistical translation systems. Although many models have surpassed them in accuracy, none have supplanted them in practice. In this paper, we propose a simple extension to the IBM models: an e_0 prior to encourage sparsity in the word-to-word translation model. We explain how to implement this extension efficiently for large-scale data (also released as a modification to GIZA++) and demonstrate, in experiments on Czech, Arabic, Chinese, and Urdu to English translation, significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ).
机译:发明后二十年,GIZA ++工具箱中广泛使用的IBM基于单词的翻译模型仍然是单词对齐的主要方法,并且是许多统计翻译系统不可或缺的一部分。尽管许多模型的准确性都超过了它们,但实际上没有一个模型可以取代它们。在本文中,我们提出了对IBM模型的简单扩展:鼓励单词间转换模型中的稀疏性的e_0。我们将说明如何有效地针对大规模数据(也作为对GIZA ++的修改而发布)有效地实现此扩展,并在捷克语,阿拉伯语,中文和乌尔都语到英语翻译的实验中证明在单词对齐方面都比IBM Model 4有了显着改进(最高+6.7 F1)和翻译品质(最高+1.4 B)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号