首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the e_0-norm
【24h】

Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the e_0-norm

机译:更好的转换较小的对齐模型:与e_0-norm的无监督单词对齐

获取原文

摘要

Two decades after their invention, the IBM word-based translation models, widely available in the GIZA++ toolkit, remain the dominant approach to word alignment and an integral part of many statistical translation systems. Although many models have surpassed them in accuracy, none have supplanted them in practice. In this paper, we propose a simple extension to the IBM models: an e_0 prior to encourage sparsity in the word-to-word translation model. We explain how to implement this extension efficiently for large-scale data (also released as a modification to GIZA++) and demonstrate, in experiments on Czech, Arabic, Chinese, and Urdu to English translation, significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ).
机译:在本发明之后二十年,基于IBM的Word的转换模型在Giza ++工具包中广泛可用,仍然是语言对齐的主导方法和许多统计翻译系统的一个组成部分。虽然许多模型的准确性超过了它们,但没有人在实践中取代了它们。在本文中,我们向IBM型号提出了一个简单的扩展:在鼓励字对词翻译模型中稀疏之前的E_0。我们解释了如何为大规模数据(也将其作为对Giza ++的修改发布)有效地实施此扩展,并在捷克语,阿拉伯语,中文和乌尔都语到英语翻译中的实验中展示,在两个字对齐中对IBM型号4的显着改进(高达+6.7 f1)和翻译质量(高达+1.​​4 b)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号