首页> 外文期刊>Artificial intelligence >Computing text semantic relatedness using the contents and links of a hypertext encyclopedia
【24h】

Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

机译:使用超文本百科全书的内容和链接计算文本语义相关性

获取原文
获取原文并翻译 | 示例
       

摘要

We propose a method for computing semantic relatedness between words or texts by using knowledge from hypertext encyclopedias such as Wikipedia. A network of concepts is built by filtering the encyclopedia's articles, each concept corresponding to an article. Two types of weighted links between concepts are considered: one based on hyperlinks between the texts of the articles, and another one based on the lexical similarity between them. We propose and implement an efficient random walk algorithm that computes the distance between nodes, and then between sets of nodes, using the visiting probability from one (set of) node(s) to another. Moreover, to make the algorithm tractable, we propose and validate empirically two truncation methods, and then use an embedding space to learn an approximation of visiting probability. To evaluate the proposed distance, we apply our method to four important tasks in natural language processing: word similarity, document similarity, document clustering and classification, and ranking in information retrieval. The performance of the method is state-of-the-art or close to it for each task, thus demonstrating the generality of the knowledge resource. Moreover, using both hyperlinks and lexical similarity links improves the scores with respect to a method using only one of them, because hyperlinks bring additional real-world knowledge not captured by lexical similarity.
机译:我们提出了一种利用超文本百科全书(如维基百科)的知识来计算单词或文本之间语义相关性的方法。通过过滤百科全书的文章来构建概念网络,每个概念对应于一篇文章。考虑了概念之间的两种加权链接:一种基于文章文本之间的超链接,另一种基于它们之间的词汇相似性。我们提出并实现了一种有效的随机游走算法,该算法使用从一个(一组)节点到另一个(一组)节点的访问概率来计算节点之间以及节点集之间的距离。此外,为了使算法易于处理,我们在经验上提出并验证了两种截断方法,然后使用嵌入空间来学习访问概率的近似值。为了评估建议的距离,我们将我们的方法应用于自然语言处理中的四个重要任务:单词相似度,文档相似度,文档聚类和分类以及信息检索中的排名。对于每个任务,该方法的性能都是最新的或接近于此,因此证明了知识资源的普遍性。此外,相对于仅使用超链接和词法相似性链接的方法,使用超链接和词法相似性链接均会提高分数,因为超链接会带来其他未被词法相似性捕获的真实世界知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号