首页> 外文会议>International Conference on Intelligence Science and Big Data Engineering >Multi-modal Retrieval via Deep Textual-Visual Correlation Learning
【24h】

Multi-modal Retrieval via Deep Textual-Visual Correlation Learning

机译:通过深度文本视觉相关学习多模态检索

获取原文
获取外文期刊封面目录资料

摘要

In this paper, we consider multi-modal retrieval from the perspective of deep textual-visual learning so as to preserve the correlations between multi-modal data. More specifically, We propose a general multi-modal retrieval algorithm to maximize the canonical correlations between multi-modal data via deep learning, which we call Deep Textual-Visual correlation learning (DTV). In DTV, given pairs of images and their describing documents, a convolutional neural network is implemented to learn the visual representation of images and a dependency-tree recursive neural network (DT-RNN) is conducted to learn compositional textual representations of documents respectively, then DTV projects the visual-textual representation into a common embedding space where each pair of multi-modal data is maximally correlated subject to being unrelated with other pairs by matrix-vector canonical correlation analysis (CCA). The experimental results indicate the effectiveness of our proposed DTV when applied to multi-modal retrieval.
机译:在本文中,我们考虑了从深文本视觉学习的角度考虑多模态检索,以便保留多模态数据之间的相关性。更具体地,我们提出了一般的多模态检索算法,通过深度学习来最大化多模态数据之间的规范相关性,我们称之为深刻的文本视觉相关学习(DTV)。在DTV中,给定对图像对及其描述文档,实现了一种卷积神经网络,以学习图像的视觉表示,并进行依赖树递归神经网络(DT-RNN)以分别学习文档的组成文本表示。然后DTV将视觉文本表示突出到公共嵌入空间中,其中每对多模态数据被矩阵矢量规范相关分析(CCA)与其他对不相关的对象是最大相关的。实验结果表明,当应用于多模态检索时,我们提出的DTV的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号