Fast Scalable Approximate Nearest Neighbor Search for High-dimensional Data

机译：高维数据的快速可扩展近似最近邻居搜索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

K-Nearest Neighbor (k-NN) search is one of the most commonly used approaches for similarity search. It finds extensive applications in machine learning and data mining. This era of big data warrants efficiently scaling k-NN search algorithms for billion-scale datasets with high dimensionality. In this paper, we propose a solution towards this end where we use vantage point trees for partitioning the dataset across multiple processes and exploit an existing graph-based sequential approximate k-NN search algorithm called HNSW (Hierarchical Navigable Small World) for searching locally within a process. Our hybrid MPI-OpenMP solution employs techniques including exploiting MPI one-sided communication for reducing communication times and partition replication for better load balancing across processes. We demonstrate computation of k-NN for 10,000 queries in the order of seconds using our approach on ∼8000 cores on a dataset with billion points in an 128-dimensional space. We also show 10X speedup over a completely k-d tree-based solution for the same dataset, thus demonstrating better suitability of our solution for high dimensional datasets. Our solution shows almost linear strong scaling,

机译：K最近邻（k-NN）搜索是最常用的相似性搜索方法之一。它在机器学习和数据挖掘中找到了广泛的应用。大数据时代确保了对数十亿规模的高维度数据集有效地缩放k-NN搜索算法。在本文中，我们为此目的提出了一种解决方案，其中我们使用优势点树在多个过程之间划分数据集，并利用一种称为HNSW（Hierarchical Navigable Small World）的现有基于图的顺序近似k-NN搜索算法在本地进行搜索一个过程。我们的混合MPI-OpenMP解决方案采用的技术包括利用MPI单侧通信来减少通信时间，并使用分区复制来更好地跨进程进行负载平衡。我们使用在128维空间中具有十亿个点的数据集上的8000个核心上的方法，以秒为单位演示了10,000个查询的k-NN计算。对于相同的数据集，我们还展示了基于完全k-d树的解决方案的10倍加速，从而证明了我们的解决方案对高维数据集的更好适用性。我们的解决方案显示出几乎线性的强缩放，

著录项

来源
《IEEE International Conference on Cluster Computing》|2020年|294-302|共9页
会议地点
作者
K G Renga Bashyam; Sathish Vadhiyar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
K-NN Search; Parallel Algorithms; Load Balancing; Vantage Point Tree; HNSW;

机译：K-NN搜索;并行算法;负载均衡; Vantage点树; HNSW;

相似文献

外文文献
中文文献
专利

1. Exploiting lower bounds to accelerate approximate nearest neighbor search on high-dimensional data [J] . Liu Yingfan, Wei Hao, Cheng Hong Information Sciences: An International Journal . 2018,第期

机译：利用下限以加速高维数据的近似邻近搜索
2. High-dimensional image descriptor matching using highly parallel KD-tree construction and approximate nearest neighbor search [J] . Hu Linjia, Nooshabadi Saeid Journal of Parallel and Distributed Computing . 2019,第OCTa期

机译：使用高度并行的KD树结构和近似最近邻搜索进行高维图像描述符匹配
3. Distance Encoded Product Quantization for Approximate K-Nearest Neighbor Search in High-Dimensional Space [J] . Heo Jae-Pil, Lin Zhe, Yoon Sung-Eui IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第9期

机译：高维空间中近似K最近邻搜索的距离编码乘积量化
4. Fast Approximate Nearest Neighbor Search via k-Diverse Nearest Neighbor Graph [C] . Yan Xiao, Jiafeng Guo, Yanyan Lan, AAAI Conference on Artificial Intelligence;Innovative Applications of Artificial Intelligence Conference;Symposium on Educational Advances in Artificial Intelligence . 2018

机译：快速近似邻近邻近邻近邻居搜索
5. Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach. [D] . Buaba, Ruben. 2012

机译：近似最近邻居搜索的快速局部敏感哈希算法：一种实用的数据挖掘方法。
6. Approximate Nearest Neighbor Search by Residual Vector Quantization [O] . Yongjian Chen, Tao Guan, Cheng Wang 2010

机译：残差矢量量化的近似最近邻搜索
7. HDIdx: High-Dimensional Indexing for Efficient Approximate Nearest Neighbor Search [O] . Wan, Ji, Tang, Sheng, Zhang, Yongdong, 2015

机译：HDIdx：高效近似最近的高维索引邻居搜索

Fast Scalable Approximate Nearest Neighbor Search for High-dimensional Data

摘要

著录项

相似文献

相关主题

期刊订阅