首页> 外文会议>Asian Conference on Computer Vision >Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework
【24h】

Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework

机译:将突出对象对齐至查询:多模态和多对象图像检索框架

获取原文

摘要

In this paper we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model learned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every dataset.
机译:在本文中,我们提出了一种在多标记图像中的多模态图像检索方法。配制多模态深网络架构以共同模拟草图和文本作为输入查询模态进入公共嵌入空间,然后与图像特征空间进一步对齐。我们的体系结构还依赖于通过从卷积功能中学到的基于监督的基于LSTM的视觉注意力的对象检测。通过使用不同损失函数概括匈牙利算法来获得查询和图像之间的对准以及图像上的注意力。这允许编码基于对象的特征及其与查询对齐,而不管训练集中的不同对象的共同发生的可用性。我们验证了我们在标准单/多对象数据集中的方法的性能,在每个数据集中显示最先进的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号