首页> 外文期刊>International Journal of Computer Vision >End-to-End Learning of Deep Visual Representations for Image Retrieval
【24h】

End-to-End Learning of Deep Visual Representations for Image Retrieval

机译:图像检索的深度视觉表现的端到端学习

获取原文
获取原文并翻译 | 示例
           

摘要

While deep learning has become a key ingredient in the top performing methods for many computer vision tasks, it has failed so far to bring similar improvements to instance-level image retrieval. In this article, we argue that reasons for the underwhelming results of deep methods on image retrieval are threefold: (1) noisy training data, (2) inappropriate deep architecture, and (3) suboptimal training procedure. We address all three issues. First, we leverage a large-scale but noisy landmark dataset and develop an automatic cleaning method that produces a suitable training set for deep retrieval. Second, we build on the recent R-MAC descriptor, show that it can be interpreted as a deep and differentiable architecture, and present improvements to enhance it. Last, we train this network with a siamese architecture that combines three streams with a triplet loss. At the end of the training process, the proposed architecture produces a global image representation in a single forward pass that is well suited for image retrieval. Extensive experiments show that our approach significantly outperforms previous retrieval approaches, including state-of-the-art methods based on costly local descriptor indexing and spatial verification. On Oxford 5k, Paris 6k and Holidays, we respectively report 94.7, 96.6, and 94.8 mean average precision. Our representations can also be heavily compressed using product quantization with little loss in accuracy.
机译:虽然深入学习已成为许多计算机视觉任务的顶级表演方法的关键成分,但到目前为止它已经失败了,以带来类似的改进对实例级图像检索。在本文中,我们认为,在图像检索的深度方法的强大方法的原因是三倍:(1)嘈杂的培训数据,(2)不适当的深度建筑,(3)次优培训程序。我们解决了所有三个问题。首先,我们利用大规模但嘈杂的地标数据集,并开发一种自动清洁方法,可以为深度检索提供合适的培训。其次,我们在最近的R-MAC描述符上构建,表明它可以被解释为深度和可差化的架构,并提高改进以增强它。最后,我们用暹罗架构训练这个网络,该架构将三个流与三态丢失相结合。在训练过程结束时,所提出的架构在一个前向通过中产生全局图像表示,其非常适合于图像检索。广泛的实验表明,我们的方法显着优于先前的检索方法,包括基于昂贵的本地描述符索引和空间验证的最先进方法。在牛津5K,巴黎6K和假期,我们分别报告94.7,96.6和94.8平均平均精度。我们的代表也可以使用产品量化,精度损失几乎没有压缩。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号