首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity
【24h】

A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity

机译:基于大型概率语义网络的术语相似度计算方法

获取原文
获取原文并翻译 | 示例

摘要

Measuring semantic similarity between two terms is essential for a variety of text analytics and understanding applications. Currently, there are two main approaches for this task, namely the knowledge based and the corpus based approaches. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Contrary to these existing techniques, we propose an efficient and effective approach for semantic similarity using a large scale semantic network. This semantic network is automatically acquired from billions of web documents. It consists of millions of concepts, which explicitly model the context of semantic relationships. In this paper, we first show how to map two terms into the concept space, and compare their similarity there. Then, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Finally, we conduct extensive studies to demonstrate that our approach can accurately compute the semantic similarity between terms of MWEs and with ambiguity, and significantly outperforms 12 competing methods under Pearson Correlation Coefficient. Meanwhile, our approach is much more efficient than all competing algorithms, and can be used to compute semantic similarity in a large scale.
机译:测量两个术语之间的语义相似性对于各种文本分析和理解应用程序至关重要。当前,有两种主要方法用于此任务,即基于知识的方法和基于语料库的方法。但是,现有方法更适合于单词之间的语义相似性,而不是更通用的多单词表达(MWE),并且它们的伸缩性不是很好。与这些现有技术相反,我们提出了使用大规模语义网络进行语义相似性的高效方法。该语义网络是从数十亿个Web文档中自动获取的。它由数百万个概念组成,这些概念显式地建模了语义关系的上下文。在本文中,我们首先展示如何将两个术语映射到概念空间,并在此处比较它们的相似性。然后,我们引入了一种聚类方法来正交化概念空间,以提高相似性度量的准确性。最后,我们进行了广泛的研究,证明了我们的方法可以准确地计算MWE词之间的语义相似度并且具有歧义性,并且在Pearson相关系数下明显优于12种竞争方法。同时,我们的方法比所有竞争算法都有效得多,可用于大规模计算语义相似度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号