首页> 外文会议>Association for Computational Linguistics Annual Meeting: Human Language Technologies;ACL-08: HLT >Unsupervised Translation Induction for Chinese Abbreviationsusing Monolingual Corpora
【24h】

Unsupervised Translation Induction for Chinese Abbreviationsusing Monolingual Corpora

机译:单语语料库的中文缩写的无监督翻译归纳

获取原文

摘要

Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations (which are mostly acronyms and truncations), the formation of Chinese abbreviations is much more complex. Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them untranslated. In this paper, we present a novel unsupervised method that automatically extracts the relation between a full-form phrase and its abbreviation from monolingual corpora, and induces translation entries for the abbreviation by using its full-form as a bridge. Our method does not require any additional annotated data other than the data that a regular translation system uses. We integrate our method into a state-of-the-art baseline translation system and show that it consistently improves the performance of the baseline system on various NIST MT test sets.
机译:中文缩写在现代中文文本中被广泛使用。与英文缩写(主要是缩写和截断)相比,中文缩写的形成要复杂得多。由于中文缩写的丰富性,许多中文缩写可能不会出现在可用的并行语料库中,在这种情况下,当前的机器翻译系统只会将它们视为未知单词,而不会进行翻译。在本文中,我们提出了一种新颖的无监督方法,该方法可以从单语语料库中自动提取完整短语及其缩写之间的关系,并通过使用其完整形式作为桥梁来为该缩写引入翻译条目。除了常规翻译系统使用的数据外,我们的方法不需要任何其他带注释的数据。我们将我们的方法集成到了最新的基线翻译系统中,并表明该方法可以在各种NIST MT测试集上不断提高基线系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号