首页> 外文OA文献 >Efficient Vector Quantization for Fast Approximate Nearest Neighbor Search
【2h】

Efficient Vector Quantization for Fast Approximate Nearest Neighbor Search

机译:快速近似最近邻搜索的有效矢量量化

摘要

Increasing sizes of databases and data stores mean that the traditional tasks, such as locating a nearest neighbor for a given data point, become too complex for classical solutions to handle. Exact solutions have been shown to scale poorly with dimensionality of the data. Approximate nearest neighbor search (ANN) is a practical compromise between accuracy and performance; it is widely applicable and is a subject of much research.Amongst a number of ANN approaches suggested in the recent years, the ones based on vector quantization stand out, achieving state-of-the-art results. Product quantization (PQ) decomposes vectors into subspaces for separate processing, allowing for fast lookup-based distance calculations. Additive quantization (AQ) drops most of PQ constraints, currently providing the best search accuracy on image descriptor datasets, but at a higher computational cost. This thesis work aims to reduce the complexity of AQ by changing a single most expensive step in the process – that of vector encoding. Both the outstanding search performance and high costs of AQ come from its generality, therefore by imposing some novel external constraints it is possible to achieve a better compromise: reduce complexity while retaining the accuracy advantage over other ANN methods. We propose a new encoding method for AQ – pyramid encoding. It requires significantly less calculations compared to the original “beam search” encoding, at the cost of an increased greediness of the optimization procedure. As its performance depends heavily on the initialization, the problem of choosing a starting point is also discussed. The results achieved by applying the proposed method are compared with the current state-of-the-art on two widely used benchmark datasets – GIST1M and SIFT1M, both generated from a real-world image data and therefore closely modeling practical applications. AQ with pyramid encoding, in addition to its computational benefits, is shown to achieve similar or better search performance than competing methods. However, its current advantages seem to be limited to data of a certain internal structure. Further analysis of this drawback provides us with the directions of possible future work.
机译:数据库和数据存储大小的增加意味着传统任务(例如为给定数据点定位最近的邻居)变得过于复杂,以致于经典解决方案无法处理。事实证明,精确的解决方案无法随数据的维数扩展。近似最近邻搜索(ANN)是准确性和性能之间的实际折衷;在近年来提出的许多人工神经网络方法中,基于矢量量化的方法脱颖而出,取得了最新的成果。产品量化(PQ)将向量分解为子空间以进行单独处理,从而允许基于快速查找的距离计算。加性量化(AQ)删除了大多数PQ约束,目前在图像描述符数据集上提供最佳搜索精度,但计算成本较高。本文的工作旨在通过更改过程中最昂贵的单个步骤-矢量编码来降低AQ的复杂性。 AQ的出色搜索性能和高成本都来自其通用性,因此,通过施加一些新颖的外部约束,可以实现更好的折衷:降低复杂度,同时保留优于其他ANN方法的准确性优势。我们提出一种用于AQ的新编码方法–金字塔编码。与原始的“波束搜索”编码相比,它所需的计算量要少得多,但代价是优化程序的贪婪性增加。由于其性能在很大程度上取决于初始化,因此还讨论了选择起点的问题。在两种广泛使用的基准数据集– GIST1M和SIFT1M上,通过应用所提出的方法获得的结果与当前的最新技术进行了比较,这两个数据集都是从真实的图像数据生成的,因此可以紧密地模拟实际应用。具有金字塔编码的AQ除了具有计算优势外,还显示出与竞争方法相比具有相似或更好的搜索性能。但是,其当前的优势似乎仅限于某种内部结构的数据。对这一缺陷的进一步分析为我们提供了未来可能工作的方向。

著录项

  • 作者

    Muravev Anton;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号