Approximate nearest neighbor search (ANNS) is a fundamental problem inmachine learning and computer vision. An ANNS algorithm is required to beefficient on both memory use and search performance. Recently, graph basedmethods have achieved revolutionary performance on public datasets. Theearliest approach GNNS is a greedy algorithm and based on a$k$-nearest-neighbor graph ($k$NN graph). The improvements of GNNS mainly focuson two aspects: (1) Some provide better initial search position to prevent thesearch from being stuck in local optima. (2) Others try to construct bettergraphs for faster traversing and neighbor locating. However, we find thereexists three problems with these works, the unconnected cluster problem, thedetouring problem and the large index problem (memory-inefficient for largescale search). In this paper, we present a novel graph structure namedNavigating Spreading-out Graph (NSG) to tackle above three problemssimultaneously, without introducing any other index structures. Extensiveexperiments show that our algorithm outperforms all the existing algorithmssignificantly. What's more, our algorithm outperforms the existing approach ofTaobao (Alibaba Group), and has been integrated into their search engine forbillion scale search.
展开▼
机译:近似最近邻搜索(Anns)是一个基本问题的内容学习和计算机愿景。 ANNS算法在内存使用和搜索性能上都需要效率效率。最近,基于绘制的方法在公共数据集上取得了革命性的表现。最可爱的方法GNN是一种贪婪的算法,并基于$ k $ -nearest邻居图($ k $ nn图表)。 GNN的改进主要是重点的两个方面:(1)有些提供更好的初始搜索位置,以防止研究陷入本地最佳。 (2)其他人试图构建更快的遍历和邻居定位。然而,我们在其中发现了这些作品的三个问题,未连接的集群问题,终点问题和大索引问题(LARGESESEPLE搜索的内存效率低)。在本文中,我们提出了一种新的曲线图结构,命名为扩频图(NSG),以在不引入任何其他指标结构的情况下以高于三个问题解决。扩展实验表明,我们的算法优于所有现有的算法。更重要的是,我们的算法优于现有的现有方法(阿里巴巴组),并已集成到他们的搜索引擎兆张尺度搜索中。
展开▼