Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

Daniel Fogaras; Balazs Racz

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

【24h】

Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

机译：大规模图相似搜索的实用算法和下界

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

To exploit the similarity information hidden in the hyperlink structure of the Web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed architecture. The similarity of multistep neighborhoods of vertices are numerically evaluated by similarity functions including SimRank [1], a recursive refinement of cocitation, and PSimRank, a novel variant with better theoretical characteristics. Our methods are presented in a general framework of Monte Carlo similarity search algorithms that precompute an index database of random fingerprints, and at query time, similarities are estimated from the fingerprints. We justify our approximation method by asymptotic worst-case lower bounds: We show that there is a significant gap between exact and approximate approaches, and suggest that the exact computation, in general, is infeasible for large-scale inputs. We were the first to evaluate SimRank on real Web data. On the Stanford WebBase [2] graph of 80M pages the quality of the methods increased significantly in each refinement step until step four.

机译：为了利用隐藏在Web的超链接结构中的相似性信息，本文介绍了可扩展到分布式体系结构上具有数十亿个顶点的图的算法。顶点的多步邻域的相似性通过相似性函数进行数值评估，这些函数包括SimRank [1]（对递归的精细化）和PSimRank（具有更好的理论特征的新颖变体）。我们的方法是在蒙特卡洛相似性搜索算法的通用框架中提出的，该算法预先计算了随机指纹的索引数据库，并且在查询时，会从指纹中估算出相似性。我们用渐近最坏情况下界证明我们的近似方法是正确的：我们证明了精确方法与近似方法之间存在显着的差距，并建议一般而言，精确计算对于大规模输入而言是不可行的。我们是第一个在真实Web数据上评估SimRank的人。在拥有8000万页的Stanford WebBase [2]图上，方法的质量在每个细化步骤中都显着提高，直到第四步为止。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2007年第2007期|p.585-598|共14页
作者
Daniel Fogaras; Balazs Racz;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Web search; graph algorithms; probabilistic algorithms.; similarity measures;

机译：网络搜索;图算法;概率算法;相似性度量;

相似文献

外文文献
中文文献
专利

1. An improved global lower bound for graph edit similarity search [J] . Gouda Karam, Arafa Mona Pattern recognition letters . 2015,第juna1期

机译：图形编辑相似性搜索的改进的全局下界
2. Algorithms meeting the lower bounds on the multiplicative complexity of length-2/sup n/ DFTs and their connection with practical algorithms [J] . Duhamel P. IEEE Transactions on Acoustics, Speech, and Signal Processing . 1990,第9期

机译：满足length-2 / sup n / DFT乘法乘法的下限及其与实际算法的联系的算法
3. Attention-based dynamic visual search using inner-scene similarity: algorithms and bounds [J] . Avraham T., Lindenbaum M. IEEE Transactions on Pattern Analysis and Machine Intelligence . 2006,第2期

机译：使用内幕相似度的基于注意力的动态视觉搜索：算法和界限
4. Quantum and Randomized Lower Bounds for Local Search on Vertex-Transitive Graphs [C] . Hang Dinh, Alexander Russell Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques . 2008

机译：顶点传递图上局部搜索的量子和随机下界
5. Optimization of energy efficient housing for the lower income demographic utilizing a generalized pattern search particle swarm optimization algorithm [D] . Cooper, Nathaniel S. 2011

机译：利用广义模式搜索粒子群算法对低收入人群的节能住宅进行优化
6. Abstract: Lower Body Lift in the Massive Weight Loss Patient: A New Classification and Algorithmic Approach for Gluteal Augmentation [O] . Taliah Schmitt, Samer Jabbour, Anne-Sophie Reguesse, 2017

机译：摘要：大规模减肥患者的下半身提升：臀肌增强的新分类和算法方法
7. Attention-based Dynamic Visual Search Using Inner-Scene Similarity: Algorithms and Bounds [O] . Tamar Avraham, Michael Lindenbaum 2005

机译：基于场景相似度的基于注意力的动态视觉搜索：算法和界限

Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅