Multi-modal Retrieval via Deep Textual-Visual Correlation Learning

机译：通过深度文本视觉相关学习多模态检索

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we consider multi-modal retrieval from the perspective of deep textual-visual learning so as to preserve the correlations between multi-modal data. More specifically, We propose a general multi-modal retrieval algorithm to maximize the canonical correlations between multi-modal data via deep learning, which we call Deep Textual-Visual correlation learning (DTV). In DTV, given pairs of images and their describing documents, a convolutional neural network is implemented to learn the visual representation of images and a dependency-tree recursive neural network (DT-RNN) is conducted to learn compositional textual representations of documents respectively, then DTV projects the visual-textual representation into a common embedding space where each pair of multi-modal data is maximally correlated subject to being unrelated with other pairs by matrix-vector canonical correlation analysis (CCA). The experimental results indicate the effectiveness of our proposed DTV when applied to multi-modal retrieval.

机译：在本文中，我们考虑了从深文本视觉学习的角度考虑多模态检索，以便保留多模态数据之间的相关性。更具体地，我们提出了一般的多模态检索算法，通过深度学习来最大化多模态数据之间的规范相关性，我们称之为深刻的文本视觉相关学习（DTV）。在DTV中，给定对图像对及其描述文档，实现了一种卷积神经网络，以学习图像的视觉表示，并进行依赖树递归神经网络（DT-RNN）以分别学习文档的组成文本表示。然后DTV将视觉文本表示突出到公共嵌入空间中，其中每对多模态数据被矩阵矢量规范相关分析（CCA）与其他对不相关的对象是最大相关的。实验结果表明，当应用于多模态检索时，我们提出的DTV的有效性。

著录项

来源
《International Conference on Intelligence Science and Big Data Engineering》|2015年||共10页
会议地点
作者
Jun Song; Yueyang Wang; Fei Wu; Weiming Lu; Siliang Tang; Yueting Zhuang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Multi-modal retrieval; Deep learning; CCA;

机译：多模态检索;深度学习;CCA;

相似文献

外文文献
中文文献
专利

1. Hybrid textual-visual relevance learning for content-based image retrieval [J] . Cui Chaoran, Lin Peiguang, Nie Xiushan, Journal of visual communication & image representation . 2017,第octa期

机译：混合文本-视觉相关性学习，用于基于内容的图像检索
2. Effective deep learning-based multi-modal retrieval [J] . Wang Wei, Yang Xiaoyan, Ooi Beng Chin, The VLDB journal . 2016,第1期

机译：基于深度学习的有效多模式检索
3. Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval [J] . Pattern Recognition: The Journal of the Pattern Recognition Society . 2020,第期

机译：基于细粒草图的图像检索的深层级联跨模态相关学习
4. Multi-modal Retrieval via Deep Textual-Visual Correlation Learning [C] . Jun Song, Yueyang Wang, Fei Wu, International Conference on intelligent science and big data engineering . 2015

机译：通过深度文本-视觉关联学习进行多模式检索
5. Advancing Multi-modal Deep Learning: Towards Language-grounded Visual Understanding [D] . Kafle, Kushal. 2020

机译：推进多模态深度学习：朝着语言接地的视觉理解
6. Distribution Structure Learning Loss (DSLL) Based on Deep Metric Learning for Image Retrieval [O] . Lili Fan, Hongwei Zhao, Haoyu Zhao, 2019

机译：基于深度度量学习的分布结构学习损失（DSLL）图像检索
7. Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval [O] . Shen, Yuming, Liu, Li, Shao, Ling, 2017

机译：深层二进制文件：为高效的文本视觉编码语义丰富的线索交叉检索

Multi-modal Retrieval via Deep Textual-Visual Correlation Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅