首页> 外文OA文献 >A novel way of computing similarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes
【2h】

A novel way of computing similarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes

机译:一种计算图节点之间相似度的新颖方法,应用于图节点的协同过滤和子空间投影

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This work presents a new perspective on characterizing the similaritybetween elements of a database or, more generally, nodes of a weightedand undirected graph. It is based on a Markov-chain model of randomwalk through the database. More precisely, we compute quantities (theaverage commute time, the pseudoinverse of the Laplacian matrixof the graph, etc) that provide similarities between any pair of nodes,having the nice property of increasing when the number of paths connect-ing those elements increases and when the “length” of paths decreases.It turns out that the square root of the average commute time is a Eu-clidean distance and that the pseudoinverse of the Laplacian matrix is akernel (its elements are inner products closely related to commute times).A procedure for computing the subspace projection of the node vectorsof the graph that preserves as much variance as possible in terms of thecommute-time distance – a principal component analysis (PCA) of thegraph – is also introduced. This graph PCA provides a nice interpreta-tion to the “Fiedler vector”, widely used for graph partitioning. Themodel is evaluated on a collaborative-recommendation task where sug-gestions are made about which movies people should watch based uponwhat they watched in the past. Experimental results on the MovieLensdatabase show that the Laplacian-based similarities (the pseudoinverse ofthe Laplacian matrix and the “random-forest matrix”) perform well incomparison with other methods. The model, which nicely fits into theso-called “statistical relational learning” framework, could also be usedto compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a database.
机译:这项工作为表征数据库元素之间或更相似地加权和无向图的节点之间的相似性提供了新的视角。它基于通过数据库的随机游走的马尔可夫链模型。更精确地说,我们计算出的数量(平均通勤时间,图的Laplacian矩阵的伪逆等)在任意一对节点之间都具有相似性,当连接这些元素的路径数量增加以及结果表明,平均通勤时间的平方根是一个Eu-clidean距离,而拉普拉斯矩阵的伪逆是阿格涅尔(其元素是与通勤时间密切相关的内积)。还介绍了计算图的节点向量的子空间投影的过程,该过程在通勤时间距离方面保留了尽可能多的方差-该图的主成分分析(PCA)。该图形PCA为广泛用于图形分区的“ Fiedler向量”提供了很好的解释。该模型是在一项合作推荐任务上进行评估的,该任务根据人们过去看过的电影来提出建议,说明人们应该看哪些电影。在MovieLens数据库上的实验结果表明,基于拉普拉斯的相似性(拉普拉斯矩阵和“随机森林矩阵”的伪逆)与其他方法相比表现良好。该模型非常适合所谓的“统计关系学习”框架,也可以用于计算文档或单词的相似度,更广泛地讲,它可以应用于涉及数据库的机器学习和模式识别任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号