Given a large graph, how can we determine similarity between nodes in a fastand accurate way? Random walk with restart (RWR) is a popular measure for thispurpose and has been exploited in numerous data mining applications includingranking, anomaly detection, link prediction, and community detection. However,previous methods for computing exact RWR require prohibitive storage sizes andcomputational costs, and alternative methods which avoid such costs bycomputing approximate RWR have limited accuracy. In this paper, we propose TPA,a fast, scalable, and highly accurate method for computing approximate RWR onlarge graphs. TPA exploits two important properties in RWR: 1) nodes close to aseed node are likely to be revisited in following steps due to block-wisestructure of many real-world graphs, and 2) RWR scores of nodes which residefar from the seed node are proportional to their PageRank scores. Based onthese two properties, TPA divides approximate RWR problem into two subproblemscalled neighbor approximation and stranger approximation. In the neighborapproximation, TPA estimates RWR scores of nodes close to the seed based onscores of few early steps from the seed. In the stranger approximation, TPAestimates RWR scores for nodes far from the seed using their PageRank. Thestranger and neighbor approximations are conducted in the preprocessing phaseand the online phase, respectively. Through extensive experiments, we show thatTPA requires up to 3.5x less time with up to 40x less memory space than otherstate-of-the-art methods for the preprocessing phase. In the online phase, TPAcomputes approximate RWR up to 30x faster than existing methods whilemaintaining high accuracy.
展开▼