首页> 外文期刊>Journal of Information Science >A novel algorithm for scalable k-nearest neighbour graph construction
【24h】

A novel algorithm for scalable k-nearest neighbour graph construction

机译:可伸缩k最近邻图构造的新算法

获取原文
获取原文并翻译 | 示例
           

摘要

Finding the k-nearest neighbours of every node in a dataset is one of the most important data operations with wide application in various areas such as recommendation and information retrieval. However, a major challenge is that the execution time of existing approaches grows rapidly as the number of nodes or dimensions increases. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbour graph. It selects a fixed number of nodes as candidates for every node by filtering out node pairs that do not have any matching dimensions with large values. Greedy filtering achieves consistent approximation accuracy across nodes in linear execution time. We also present a faster version of greedy filtering that uses inverted indices on the node prefixes. Through theoretical analysis, we show that greedy filtering is effective for datasets whose features have Zipfian distribution, a characteristic observed in majority of large datasets. We also conduct extensive comparative experiments against (a) three state-of-the-art algorithms, and (b) three algorithms in related research domains. Our experimental results show that greedy filtering consistently outperforms other algorithms in various types of high-dimensional datasets.
机译:查找数据集中每个节点的k个近邻是最重要的数据操作之一,在建议和信息检索等各个领域都有广泛的应用。然而,一个主要的挑战是,随着节点数量或维度的增加,现有方法的执行时间会迅速增长。在本文中,我们提出了贪婪过滤,这是一种用于查找近似k最近邻图的有效且可扩展的算法。它通过滤除没有任何匹配维的值较大的节点对,为每个节点选择固定数量的节点作为候选节点。贪心过滤可在线性执行时间内跨节点实现一致的近似精度。我们还提出了一种更快的贪婪过滤版本,该版本在节点前缀上使用了倒排索引。通过理论分析,我们表明,贪婪过滤对于特征具有Zipfian分布的数据集是有效的,Zipfian分布是大多数大型数据集中观察到的特征。我们还针对(a)三种最先进的算法以及(b)相关研究领域中的三种算法进行了广泛的比较实验。我们的实验结果表明,在各种类型的高维数据集中,贪心过滤始终优于其他算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号