首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
【24h】

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

机译:外观,想象和匹配:使用生成模型改进文本视觉跨模态检索

获取原文

摘要

Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities. Learning appropriate representations for multi-modal data is crucial for the cross-modal retrieval performance. Unlike existing image-text retrieval approaches that embed image-text pairs as single feature vectors in a common representational space, we propose to incorporate generative processes into the cross-modal feature embedding, through which we are able to learn not only the global abstract features but also the local grounded features. Extensive experiments show that our framework can well match images and sentences with complex content, and achieve the state-of-the-art cross-modal retrieval results on MSCOCO dataset.
机译:文本视觉跨模式检索已成为计算机视觉和自然语言处理社区中的热门研究主题。学习多模式数据的适当表示形式对于跨模式检索性能至关重要。与现有的将图像-文本对作为单个特征向量嵌入公共表示空间中的现有图像-文本检索方法不同,我们建议将生成过程纳入交叉模式特征嵌入中,通过该过程,我们不仅可以学习全局抽象特征而且还有本地接地功能。大量的实验表明,我们的框架可以很好地匹配具有复杂内容的图像和句子,并在MSCOCO数据集上获得最新的交叉模式检索结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号