首页> 外文期刊>IEEE Transactions on Image Processing >REMAP: Multi-Layer Entropy-Guided Pooling of Dense CNN Features for Image Retrieval
【24h】

REMAP: Multi-Layer Entropy-Guided Pooling of Dense CNN Features for Image Retrieval

机译:REMAP:用于图像检索的密集CNN特征的多层熵引导池

获取原文
获取原文并翻译 | 示例

摘要

This paper addresses the problem of very large-scale image retrieval, focusing on improving its accuracy and robustness. We target enhanced robustness of search to factors, such as variations in illumination, object appearance and scale, partial occlusions, and cluttered backgrounds- particularly important when a search is performed across very large datasets with significant variability. We propose a novel CNN-based global descriptor, called REMAP, which learns and aggregates a hierarchy of deep features from multiple CNN layers, and is trained end-to-end with a triplet loss. REMAP explicitly learns discriminative features which are mutually supportive and complementary at various semantic levels of visual abstraction. These dense local features are max-pooled spatially at each layer, within multi-scale overlapping regions, before aggregation into a single image-level descriptor. To identify the semantically useful regions and layers for retrieval, we propose to measure the information gain of each region and layer using KL-divergence. Our system effectively learns during training how useful various regions and layers are and weights them accordingly. We show that such relative entropy-guided aggregation outperforms classical CNN-based aggregation controlled by SGD. The entire framework is trained in an end-to-end fashion, outperforming the latest state-of-the-art results. On image retrieval datasets Holidays, Oxford, and MPEG, the REMAP descriptor achieves mAP of 95.5%, 91.5%, and 80.1%, respectively, outperforming any results published to date. REMAP also formed the core of the winning submission to the Google Landmark Retrieval Challenge on Kaggle.
机译:本文针对超大规模图像检索的问题,着重于提高其准确性和鲁棒性。我们将增强搜索的鲁棒性针对各种因素,例如光照,对象外观和比例的变化,部分遮挡以及背景混乱等,当对具有显着可变性的超大型数据集执行搜索时,这一点尤其重要。我们提出了一种新颖的基于CNN的全局描述符,称为REMAP,该描述符从多个CNN层中学习并聚集了深层特征的层次结构,并以三重态损失端对端地进行了训练。 REMAP明确学习了在视觉抽象的各种语义级别上相互支持和互补的区别特征。这些密集的局部特征在聚合为单个图像级描述符之前,在多尺度重叠区域内的每一层上在空间上进行最大池化。为了识别在语义上有用的区域和层以进行检索,我们建议使用KL散度来测量每个区域和层的信息增益。我们的系统在训练过程中有效地学习了各个区域和层次的有用性,并对它们进行了加权。我们表明,这种相对熵引导的聚集优于由SGD控制的基于经典CNN的聚集。整个框架以端到端的方式进行培训,胜过最新的结果。在图像检索数据集Holidays,Oxford和MPEG上,REMAP描述符的mAP分别达到95.5%,91.5%和80.1%,超过了迄今为止发布的任何结果。 REMAP也是Kaggle上Google Landmark检索挑战赛获奖作品的核心。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号