...
首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors
【24h】

Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors

机译:通过鲁棒的局部描述符聚合来改善大规模图像检索

获取原文
获取原文并翻译 | 示例

摘要

Visual search and image retrieval underpin numerous applications, however the task is still challenging predominantly due to the variability of object appearance and ever increasing size of the databases, often exceeding billions of images. Prior art methods rely on aggregation of local scale-invariant descriptors, such as SIFT, via mechanisms including Bag of Visual Words (BoW), Vector of Locally Aggregated Descriptors (VLAD) and Fisher Vectors (FV). However, their performance is still short of what is required. This paper presents a novel method for deriving a compact and distinctive representation of image content called Robust Visual Descriptor with Whitening (RVD-W). It significantly advances the state of the art and delivers world-class performance. In our approach local descriptors are rank-assigned to multiple clusters. Residual vectors are then computed in each cluster, normalized using a direction-preserving normalization function and aggregated based on the neighborhood rank. Importantly, the residual vectors are de-correlated and whitened in each cluster before aggregation, leading to a balanced energy distribution in each dimension and significantly improved performance. We also propose a new post-PCA normalization approach which improves separability between the matching and non-matching global descriptors. This new normalization benefits not only our RVD-W descriptor but also improves existing approaches based on FV and VLAD aggregation. Furthermore, we show that the aggregation framework developed using hand-crafted SIFT features also performs exceptionally well with Convolutional Neural Network (CNN) based features. The RVD-W pipeline outperforms state-of-the-art global descriptors on both the Holidays and Oxford datasets. On the large scale datasets, Holidays1M and Oxford1M, SIFT-based RVD-W representation obtains a mAP of 45.1 and 35.1 percent, while CNN-based RVD-W achieve a mAP of 63.5 and 44.8 percent, all yielding superior performance to the state-of-the-art.
机译:视觉搜索和图像检索是众多应用程序的基础,但是由于对象外观的可变性和数据库规模的不断增加(通常超过数十亿个图像),该任务仍然具有挑战性。现有技术方法通过包括视觉词袋(BoW),局部聚集描述符向量(VLAD)和费舍尔向量(FV)的机制依赖于局部尺度不变的描述符(例如SIFT)的聚集。但是,它们的性能仍未达到要求。本文提出了一种新颖的方法来获得紧凑而独特的图像内容表示方法,称为具有增白功能的鲁棒视觉描述符(RVD-W)。它大大提高了技术水平并提供了世界一流的性能。在我们的方法中,将本地描述符分级分配给多个群集。然后,在每个聚类中计算残差矢量,使用方向保留的归一化函数对其进行归一化,然后根据邻域等级进行汇总。重要的是,残差矢量在聚合之前在每个群集中不相关且变白,从而导致每个维度上的能量分配平衡,并显着提高了性能。我们还提出了一种新的PCA后标准化方法,该方法可提高匹配的全局描述符和不匹配的全局描述符之间的可分离性。这种新的规范化不仅使我们的RVD-W描述符受益,而且改进了基于FV和VLAD聚合的现有方法。此外,我们表明使用手工制作的SIFT功能开发的聚合框架在基于卷积神经网络(CNN)的功能上也表现出色。 RVD-W管道在Holidays和Oxford数据集上均优于最新的全局描述符。在大型数据集(Holidays1M和Oxford1M)上,基于SIFT的RVD-W表示的mAP分别为45.1%和35.1%,而基于CNN的RVD-W的mAP分别为63.5和44.8%,均表现出优于州政府的性能。最先进的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号