首页> 外文会议>Association for Computational Linguistics Annual Meeting: Human Language Technologies >Unsupervised Translation Induction for Chinese Abbreviationsusing Monolingual Corpora
【24h】

Unsupervised Translation Induction for Chinese Abbreviationsusing Monolingual Corpora

机译:无监督的翻译归纳式缩写为单格式语料库

获取原文

摘要

Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations (which are mostly acronyms and truncations), the formation of Chinese abbreviations is much more complex. Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them untranslated. In this paper, we present a novel unsupervised method that automatically extracts the relation between a full-form phrase and its abbreviation from monolingual corpora, and induces translation entries for the abbreviation by using its full-form as a bridge. Our method does not require any additional annotated data other than the data that a regular translation system uses. We integrate our method into a state-of-the-art baseline translation system and show that it consistently improves the performance of the baseline system on various NIST MT test sets.
机译:中国缩写广泛用于现代中文文本。与英语缩写(主要是首字母缩略词和截断)相比,中国缩写的形成更复杂。由于中国缩写的丰富性,其中许多人可能不会出现在可用的并行语料库中,在这种情况下,当前机器翻译系统只是将它们视为未知的单词并将其留下未经翻译。在本文中,我们提出了一种新颖的无人监督方法,可以自动提取全文短语与其缩写之间的关系,并通过使用其全文作为桥接来引导缩写的翻译条目。我们的方法不需要除了定期翻译系统使用的数据之外的任何额外的注释数据。我们将方法集成到最先进的基线翻译系统中,并表明它一直提高基线系统对各种NIST MT测试集的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号