首页> 外文期刊>ACM transactions on Asian language information processing >Aligning Word Senses Using Bilingual Corpora
【24h】

Aligning Word Senses Using Bilingual Corpora

机译:使用双语语料库对齐词义

获取原文
获取原文并翻译 | 示例
       

摘要

The growing importance of multilingual information retrieval and machine translation has made multilingual ontologies extremely valuable resources. Since the construction of an ontology from scratch is a very expensive and time-consuming undertaking, it is attractive to consider ways of automatically aligning monolingual ontologies, which already exist for many of the world's major languages. Previous research exploited similarity in the structure of the ontologies to align, or manually created bilingual resources. These approaches cannot be used to align ontologies with vastly different structures and can only be applied to much studied language pairs for which expensive resources are already available. In this paper, we propose a novel approach to align the ontologies at the node level: Given a concept represented by a particular word sense in one ontology, our task is to find the best corresponding word sense in the second language ontology. To this end, we present a language-independent, corpus-based method that borrows from techniques used in information retrieval and machine translation. We show its efficiency by applying it to two very different ontologies in very different languages: the Mandarin Chinese HowNet and the American English WordNet. Moreover, we propose a methodology to measure bilingual corpora comparability and show that our method is robust enough to use noisy nonparallel bilingual corpora efficiently, when clean parallel corpora are not available.
机译:多语言信息检索和机器翻译的重要性日益增长,已使多语言本体成为极为宝贵的资源。由于从头开始构建本体是一项非常昂贵且耗时的工作,因此考虑自动对齐单语言本体的方法是很有吸引力的,这种方法已经存在于世界上许多主要语言中。先前的研究利用本体结构的相似性来对齐或手动创建双语资源。这些方法不能用于使本体结构具有截然不同的结构,而只能应用于已经研究了很多昂贵资源的语言对。在本文中,我们提出了一种在节点级别对齐本体的新颖方法:给定一个由一个本体中的特定词义表示的概念,我们的任务是在第二语言本体中找到最佳的对应词义。为此,我们提出一种独立于语言,基于语料库的方法,该方法借鉴了信息检索和机器翻译中使用的技术。我们通过将其应用于两种非常不同的语言(普通话中文知网和美国英语WordNet)来显示其效率。此外,我们提出了一种测量双语语料库可比性的方法,并表明当没有干净的平行语料库时,我们的方法足够强大,可以有效地使用嘈杂的非平行双语语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号