首页> 中文期刊> 《情报学报》 >基于中文维基百科的词语相关度计算

基于中文维基百科的词语相关度计算

         

摘要

词语相关度的计算是自然语言处理关键技术之一,在信息检索、机器翻译、词义消歧、句法分析等领域有广泛应用.国内现有大部分词语相关度计算方法是基于知网(HowNet)的.本文将中文维基百科作为语义资源,利用其分类层次、概念文档之间的链接来计算汉语词语之间的相关度.在借鉴向量空间模型和谷歌相似度(Google Similarity Distance)计算方法基础上,通过构建分类图和相关语义向量来实现汉语词语相关度的计算.在测试集WordSimilarity-353上进行了实验,实验结果的斯皮尔曼等级相关系数显示,本文的方法是可行和有效的.%Word relatedness measure is one of the key teehnologies in natural language proeessing. It is widely usedrnin the fields of information retrieval, machine translation, word disambiguation and syntax analysis, etc. Most of existing methods of Chinese word relatedness computing is based on HowNet resource. In this paper, based on Chinese Wikipedia, the relatedness among Chinese words is measured by making use of the links of classification hierarchy and the links ofrnvector based on vector space model and Google similarity distance of information retrieval technology. The experiments are performed on the test set WordSimilarity-353 and the Spearman rank correlation coefficient shows that the proposed method of word relatedness measure is feasible and effective.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号