首页> 外文会议>International conference on database systems for advanced applications >Greedy Filtering: A Scalable Algorithm for K-Nearest Neighbor Graph Construction
【24h】

Greedy Filtering: A Scalable Algorithm for K-Nearest Neighbor Graph Construction

机译:贪婪过滤:K最近邻图构建的可扩展算法

获取原文

摘要

Finding the k-nearest neighbors for every node is one of the most important data mining tasks as a primitive operation in the field of Information Retrieval and Recommender Systems. However, existing approaches to this problem do not perform as well when the number of nodes or dimensions is scaled up. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate fc-nearest neighbor graph by filtering node pairs whose large value dimensions do not match at all. In order to avoid skewness in the results and guarantee a time complexity of O(n), our algorithm chooses essentially a fixed number of node pairs as candidates for every node. We also present a faster version of greedy filtering based on the use of inverted indices for the node prefixes. We conduct extensive experiments in which we (ⅰ) compare our approaches to the state-of-the-art algorithms in seven different types of datasets, and (ⅱ) adopt other algorithms in related fields (similarity join, top-k similarity join and similarity search fields) to solve this problem and evaluate them. The experimental results show that greedy filtering guarantees a high level of accuracy while also being much faster than other algorithms for large amounts of high-dimensional data.
机译:作为信息检索和推荐系统领域中的原始操作,为每个节点找到k个最近的邻居是最重要的数据挖掘任务之一。但是,当按比例增加节点数或维度时,解决此问题的现有方法效果不佳。在本文中,我们提出贪婪过滤,这是一种高效且可扩展的算法,可通过过滤大小值根本不匹配的节点对来找到近似的fc最近邻居图。为了避免结果偏斜并保证O(n)的时间复杂度,我们的算法实质上选择固定数量的节点对作为每个节点的候选对象。我们还基于节点前缀的倒排索引的使用,提出了一种更快版本的贪婪过滤。我们进行了广泛的实验,其中(ⅰ)在7种不同类型的数据集中比较我们与最新算法的方法,并且(ⅱ)在相关领域采用其他算法(相似性连接,top-k相似性连接和相似性搜索字段)来解决此问题并对其进行评估。实验结果表明,贪婪过滤可确保较高的准确性,同时也比其他算法处理大量高维数据要快得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号