首页> 外文会议>Fourth International Conference on Genetic and Evolutionary Computing >Minimum Normalized Google Distance for Unsupervised Multilingual Chinese-English Word Sense Disambiguation
【24h】

Minimum Normalized Google Distance for Unsupervised Multilingual Chinese-English Word Sense Disambiguation

机译:无监督多语言汉英单词义消歧的最小标准化Google距离

获取原文

摘要

This paper introduces normalized Google distance into the study of word sense disambiguation and presents a novel unsupervised method of word sense disambiguation. The normalized Google distance is a theory of similarity between words and phrases, based on information distance and Kolmogorov complexity by using the world-wide-web as database, with its page counts derived from a search engine such as Google. This unsupervised method regards the word sense disambiguation as a process of searching minimum normalized Google distance between n-gram and the translation or synonym of the target word, based on the supposition that one sense per n-gram. Our System is tested on Multilingual Chinese-English Lexical Sample task in Semeval-2007. Experimental result shows that our method outperforms the best competing system. Our Experiment on nouns of this dataset also gives a promising result.
机译:本文将归一化的谷歌距离引入到词义消歧的研究中,并提出了一种新颖的无监督词义消歧方法。归一化的Google距离是一种理论,它基于信息距离和Kolmogorov复杂性,通过使用万维网作为数据库,基于信息距离和Kolmogorov复杂度,其页面数来源于搜索引擎(例如Google)。这种无监督的方法基于每个n-gram一个义的假设,将单词义的歧义视为在n-gram与目标单词的翻译或同义词之间搜索最小归一化Google距离的过程。我们的系统已在2007年Semeval的多语言中英文词汇示例任务中经过测试。实验结果表明,我们的方法优于最佳竞争系统。我们对该数据集的名词进行的实验也给出了可喜的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号