Efficient Indexing of Billion-Scale Datasets of Deep Descriptors

机译：深度描述符十亿规模数据集的有效索引

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Existing billion-scale nearest neighbor search systems have mostly been compared on a single dataset of a billion of SIFT vectors, where systems based on the Inverted Multi-Index (IMI) have been performing very well, achieving state-of-the-art recall in several milliseconds. SIFT-like descriptors, however, are quickly being replaced with descriptors based on deep neural networks (DNN) that provide better performance for many computer vision tasks. In this paper, we introduce a new dataset of one billion descriptors based on DNNs and reveal the relative inefficiency of IMI-based indexing for such descriptors compared to SIFT data. We then introduce two new indexing structures, the Non-Orthogonal Inverted Multi-Index (NO-IMI) and the Generalized Non-Orthogonal Inverted Multi-Index (GNO-IMI). We show that due to additional flexibility, the new structures are able to adapt to DNN descriptor distribution in a better way. In particular, extensive experiments on the new dataset demonstrate that these data structures provide considerably better trade-off between the speed of retrieval and recall, given similar amount of memory, as compared to the standard Inverted Multi-Index.

机译：大多数现有的十亿规模的最近邻搜索系统已在十亿个SIFT向量的单个数据集上进行了比较，其中基于反向多索引（IMI）的系统运行良好，实现了最新的召回率在几毫秒内。但是，类似于SIFT的描述符很快就被基于深度神经网络（DNN）的描述符所取代，该描述符为许多计算机视觉任务提供了更好的性能。在本文中，我们引入了一个基于DNN的十亿个描述符的新数据集，并揭示了与SIFT数据相比，此类描述符的基于IMI索引的效率相对较低。然后，我们介绍两个新的索引结构，即非正交倒置多索引（NO-IMI）和广义非正交倒置多索引（GNO-IMI）。我们证明，由于具有更大的灵活性，新结构能够以更好的方式适应DNN描述符的分布。尤其是，在新数据集上进行的大量实验表明，与标准的“反向多索引”相比，在给定相似的内存量的情况下，这些数据结构在检索和重新调用的速度之间提供了更好的折衷方案。

著录项

来源
《IEEE Conference on Computer Vision and Pattern Recognition》|2016年|2055-2063|共9页
会议地点
作者
Artem Babenko Yandex; Victor Lempitsky;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Indexing; Computer vision; Correlation; Vector quantization;

机译：索引;计算机视觉;相关性;矢量量化;

相似文献

外文文献
中文文献
专利

1. Effective and efficient indexing in cross-modal hashing-based datasets [J] . Intelligence: A Multidisciplinary Journal . 2020,第期

机译：基于跨模型散列的数据集有效和高效的索引
2. Effective and efficient indexing in cross-modal hashing-based datasets [J] . Chiu Chih-Yi, Markchit Sarawut Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2020,第期

机译：基于跨模型散列的数据集有效和高效的索引
3. An efficient automated biospeckle indexing strategy using morphological and geo-statistical descriptors [J] . Chatterjee Amit, Singh Puneet, Bhatia Vimal, Optics and Lasers in Engineering . 2020,第Nova期

机译：一种使用形态学和地理统计描述符的高效自动化生物索引策略
4. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors [C] . Artem Babenko Yandex, Victor Lempitsky IEEE Conference on Computer Vision and Pattern Recognition . 2016

机译：有效索引亿尺度的深层描述符数据集
5. Time series retrieval: Indexing and mining large datasets. [D] . Shieh, Jin-Wien. 2010

机译：时间序列检索：索引和挖掘大型数据集。
6. REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets [O] . Camille Marchet, Zamin Iqbal, Daniel Gautheret, -1

机译：REINDEER：在测序数据集中高效索引k-mer的存在和丰度
7. REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets [O] . Camille Marchet, Zamin Iqbal, Daniel Gautheret, 2020

机译：驯鹿：在测序数据集中高效索引K-MER的存在和丰富

Efficient Indexing of Billion-Scale Datasets of Deep Descriptors

摘要

著录项

相似文献

相关主题

期刊订阅