The task of obtaining the items highly-relevant to a given set of query items is a basis for various applications, such as recommendation and prediction. A family of path-based relevance metrics, which quantify item relevance based on the paths in a given item graph, have been shown to be effective in capturing the relevance in many applications. Despite their effectiveness, path-based relevance normally requires time-consuming iterative computation. We propose an approach to obtain the top-k most relevant items for a given query item set quickly. Our approach can obtain the top-k items without having to compute converged scores. The approach is designed for a distributed environment, which makes it scale for massive graphs having hundreds of millions of nodes. Our experimental results show that the proposed approach can produce the result 20 to 50 times faster than a previously proposed approach and can scale well with both the size of input and the number of machines used in the computation.
展开▼