首页> 外文会议>Workshop on graph-based methods for natural language processing 2011. >Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering
【24h】

Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering

机译:使用基于维基百科的语义相关性度量进行文档聚类

获取原文
获取原文并翻译 | 示例

摘要

A graph-based distance between Wikipedia articles is defined using a random walk model, which estimates visiting probability (VP) between articles using two types of links: hyperlinks and lexical similarity relations. The VP to and from a set of articles is then computed, and approximations are proposed to make tractable the computation of semantic relatedness between every two texts in a large data set. The model is applied to document clustering on the 20 Newsgroups data set. Precision and recall are improved in comparison with previous textual distance algorithms.
机译:Wikipedia文章之间的基于图的距离是使用随机游走模型定义的,该模型使用两种类型的链接(超链接和词汇相似关系)来估计文章之间的访问概率(VP)。然后计算往返于一组文章的VP,并提出近似值以简化大型数据集中每两个文本之间语义相关性的计算。该模型适用于20个新闻组数据集上的文档聚类。与以前的文本距离算法相比,准确性和查全率有所提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号