End-to-End Learning of Deep Visual Representations for Image Retrieval

Gordo Albert; Almazan Jon; Revaud Jerome; Larlus Diane

首页> 外文期刊>International Journal of Computer Vision >End-to-End Learning of Deep Visual Representations for Image Retrieval

【24h】

End-to-End Learning of Deep Visual Representations for Image Retrieval

机译：图像检索的深度视觉表现的端到端学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

While deep learning has become a key ingredient in the top performing methods for many computer vision tasks, it has failed so far to bring similar improvements to instance-level image retrieval. In this article, we argue that reasons for the underwhelming results of deep methods on image retrieval are threefold: (1) noisy training data, (2) inappropriate deep architecture, and (3) suboptimal training procedure. We address all three issues. First, we leverage a large-scale but noisy landmark dataset and develop an automatic cleaning method that produces a suitable training set for deep retrieval. Second, we build on the recent R-MAC descriptor, show that it can be interpreted as a deep and differentiable architecture, and present improvements to enhance it. Last, we train this network with a siamese architecture that combines three streams with a triplet loss. At the end of the training process, the proposed architecture produces a global image representation in a single forward pass that is well suited for image retrieval. Extensive experiments show that our approach significantly outperforms previous retrieval approaches, including state-of-the-art methods based on costly local descriptor indexing and spatial verification. On Oxford 5k, Paris 6k and Holidays, we respectively report 94.7, 96.6, and 94.8 mean average precision. Our representations can also be heavily compressed using product quantization with little loss in accuracy.

机译：虽然深入学习已成为许多计算机视觉任务的顶级表演方法的关键成分，但到目前为止它已经失败了，以带来类似的改进对实例级图像检索。在本文中，我们认为，在图像检索的深度方法的强大方法的原因是三倍：（1）嘈杂的培训数据，（2）不适当的深度建筑，（3）次优培训程序。我们解决了所有三个问题。首先，我们利用大规模但嘈杂的地标数据集，并开发一种自动清洁方法，可以为深度检索提供合适的培训。其次，我们在最近的R-MAC描述符上构建，表明它可以被解释为深度和可差化的架构，并提高改进以增强它。最后，我们用暹罗架构训练这个网络，该架构将三个流与三态丢失相结合。在训练过程结束时，所提出的架构在一个前向通过中产生全局图像表示，其非常适合于图像检索。广泛的实验表明，我们的方法显着优于先前的检索方法，包括基于昂贵的本地描述符索引和空间验证的最先进方法。在牛津5K，巴黎6K和假期，我们分别报告94.7,96.6和94.8平均平均精度。我们的代表也可以使用产品量化，精度损失几乎没有压缩。

著录项

来源
《International Journal of Computer Vision》 |2017年第2期|共18页
作者
Gordo Albert; Almazan Jon; Revaud Jerome; Larlus Diane;
展开▼
作者单位

Xerox Res Ctr Europe Comp Vis Grp Meylan France;

Xerox Res Ctr Europe Comp Vis Grp Meylan France;

Xerox Res Ctr Europe Comp Vis Grp Meylan France;

Xerox Res Ctr Europe Comp Vis Grp Meylan France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Deep learning; Instance-level retrieval; Visual search; Visual representation;

机译：深度学习;实例级检索;视觉搜索;视觉表示;

相似文献

外文文献
中文文献
专利

1. End-to-End Learning of Deep Visual Representations for Image Retrieval [J] . Gordo Albert, Almazan Jon, Revaud Jerome, International Journal of Computer Vision . 2017,第2期

机译：图像检索的深度视觉表现的端到端学习
2. Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech [J] . Benjamin Sertolli, Zhao Ren, Bjoern W. Schuller, Computer speech and language . 2021,第Jula期

机译：从言语中，从深端到端语音识别网络中的代表转移学习
3. Learning reinforced attentional representation for end-to-end visual tracking [J] . Information Sciences: An International Journal . 2020,第期

机译：学习终端视觉跟踪的加强注意力表示
4. An End-to-End Image Retrieval System Based on Gravitational Field Deep Learning [C] . Qinghe Zheng, Mingqiang Yang, Qingrui Zhang, 2017 International Conference on Computer Systems, Electronics and Control . 2017

机译：基于引力场深度学习的端到端图像检索系统
5. Learning Deep Image-Text Representations for Referring Visual Recognition [D] . Li, Ruiyu. 2018

机译：学习深度图像文本表示形式以引用视觉识别
6. An effective content-based image retrieval technique for image visuals representation based on the bag-of-visual-words model [O] . Safia Jabeen, Zahid Mehmood, Toqeer Mahmood, -1

机译：基于视觉袋模型的基于内容的有效图像检索技术
7. End-to-end Learning of Deep Visual Representations for Image Retrieval [O] . Gordo, Albert, Almazan, Jon, Revaud, Jerome, 2017

机译：用于图像检索的深度视觉表示的端到端学习

End-to-End Learning of Deep Visual Representations for Image Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅