首页> 外文期刊>Information Processing & Management >An approach for measuring semantic similarity between Wikipedia concepts using multiple inheritances
【24h】

An approach for measuring semantic similarity between Wikipedia concepts using multiple inheritances

机译:一种使用多重继承来度量Wikipedia概念之间语义相似性的方法

获取原文
获取原文并翻译 | 示例
           

摘要

Wikipedia provides a huge collaboratively made semi-structured taxonomy called Wikipedia category graph (WCG), which can be utilized as a Knowledge Graph (KG) to measure the semantic similarity (SS) between Wikipedia concepts. Previously, several Most Informative Common Ancestor-based (MICA-based) SS methods have been proposed by intrinsically manipulating the taxonomic structure of WCG. However, some basic structural issues in WCG such as huge size, branching factor and multiple inheritance relations hamper the applicability of traditional MICA-based and multiple inheritance-based approaches in it. Therefore, in this paper, we propose a solution to handle these structural issues and present a new multiple inheritance-based SS approach, called Neighborhood Ancestor Semantic Contribution (NASC). In this approach, firstly, we define the neighborhood of a category (a taxonomic concept in WCG) to define its semantic space. Secondly, we describe the semantic value of a category by aggregating the intrinsic IC-based semantic contribution weights of its semantically relevant multiple ancestors. Thirdly, based on our approach, we propose six different methods to compute the SS between Wikipedia concepts. Finally, we evaluate our methods on gold standard word similarity benchmarks for English, German, Spanish and French languages. The experimental evaluation demonstrates that the proposed NASC-based methods remarkably outperform traditional MICA-based and multiple inheritance-based approaches.
机译:Wikipedia提供了称为Wikipedia类别图(WCG)的大型协作式半结构分类法,可以用作知识图(KG)来度量Wikipedia概念之间的语义相似度(SS)。以前,通过固有地操纵WCG的分类结构,已经提出了几种基于大多数信息的基于共同祖先(基于MICA)的SS方法。但是,WCG中的一些基本结构问题(例如,巨大的规模,分支因子和多重继承关系)阻碍了传统基于MICA和基于多重继承的方法的适用性。因此,在本文中,我们提出了解决这些结构问题的解决方案,并提出了一种新的基于多继承的SS方法,称为邻居祖先语义贡献(NASC)。在这种方法中,首先,我们定义类别的邻域(WCG中的分类学概念)以定义其语义空间。其次,我们通过汇总其语义相关的多个祖先的基于IC的内在语义贡献权重来描述类别的语义值。第三,基于我们的方法,我们提出了六种不同的方法来计算Wikipedia概念之间的SS。最后,我们在针对英语,德语,西班牙语和法语的黄金标准单词相似性基准上评估我们的方法。实验评估表明,所提出的基于NASC的方法明显优于传统的基于MICA和基于多继承的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号