首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation
【24h】

Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation

机译:图的节点之间相似度的随机游动计算及其在协同推荐中的应用

获取原文
获取原文并翻译 | 示例

摘要

This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commute-time distance. This graph PCA provides a nice interpretation to the "Fiedler vector," widely used for graph partitioning. The model is evaluated on a collaborative-recommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacian-based similarities perform well in comparison with other methods. The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a relational database
机译:这项工作为表征数据库元素之间或更相似的加权和无向图的节点之间的相似性提供了新的视角。它基于随机遍历数据库的马尔可夫链模型。更准确地说,我们计算出的数量(平均通勤时间,图的Laplacian矩阵的伪逆等)在任意一对节点之间都具有相似性,当连接这些元素的路径数量增加时,具有增加的好特性。当路径的“长度”减小时。事实证明,平均通勤时间的平方根是欧几里得距离,而拉普拉斯矩阵的伪逆是一个核矩阵(其元素是与通勤时间密切相关的内积)。引入图的主成分分析(PCA)来计算节点向量的子空间投影,其方式应在欧氏通勤时间距离方面保留尽可能多的方差。该图形PCA为广泛用于图形分区的“ Fiedler向量”提供了很好的解释。该模型是在一项协作推荐任务上进行评估的,该任务根据过去观看的内容提出有关人们应该观看哪些电影的建议。 MovieLens数据库上的实验结果表明,与其他方法相比,基于Laplacian的相似性表现良好。该模型非常适合所谓的“统计关系学习”框架,也可以用于计算文档或单词的相似度,并且更广泛地,它可以应用于涉及关系的机器学习和模式识别任务数据库

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号