2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning

机译：基于二维卷积的特征融合用于跨模态相关学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cross-modal information retrieval (CMIR) enables users to search for semantically relevant data of various modalities from a given query of one modality. The predominant challenge is to alleviate the "heterogeneous gap" between different modalities. For text-image retrieval, the typical solution is to project text features and image features into a common semantic space and measure the cross-modal similarity. However, semantically relevant data from different modalities usually contains imbalanced information. Aligning all the modalities in the same space will weaken modal-specific semantics and introduce unexpected noise. In this paper, we propose a novel CMIR framework based on multi-modal feature fusion. In this framework, the cross-modal similarity is measured by directly analyzing the fine-grained correlations between the text features and image features without common semantic space learning. Specifically, we preliminarily construct a cross-modal feature matrix to fuse the original visual and textural features. Then the 2D-convolutional networks are proposed to reason about inner-group relationships among features across modalities, resulting in fine-grained text-image representations. The cross-modal similarity is measured by a multi-layer perception based on the fused feature representations. We conduct extensive experiments on two representative CMIR datasets, i.e. English Wikipedia and TVGraz. Experimental results indicate that our model outperforms state-of-the-art methods significantly. Meanwhile, the proposed cross-modal feature fusion approach is more effective in the CMIR tasks compared with other feature fusion approaches.

机译：跨模态信息检索（CMIR）使用户可以从一种模态的给定查询中搜索各种模态的语义相关数据。主要挑战是减轻不同方式之间的“异质性差距”。对于文本图像检索，典型的解决方案是将文本特征和图像特征投影到公共语义空间中，并测量跨模式相似性。但是，来自不同形式的语义相关数据通常包含不平衡信息。对齐同一空间中的所有模态会削弱特定于模态的语义并引入意外的噪声。在本文中，我们提出了一种基于多模式特征融合的新颖CMIR框架。在此框架中，跨模式相似性是通过直接分析文本特征和图像特征之间的细粒度相关性而无需共同的语义空间学习来测量的。具体来说，我们初步构建了一个交叉模式特征矩阵，以融合原始的视觉和纹理特征。然后，提出了二维卷积网络，以对跨模态的特征之间的内部组关系进行推理，从而得到细粒度的文本图像表示。跨模态相似性是通过基于融合特征表示的多层感知来测量的。我们对两个代表性的CMIR数据集（即英语Wikipedia和TVGraz）进行了广泛的实验。实验结果表明，我们的模型明显优于最新方法。同时，与其他特征融合方法相比，所提出的交叉模式特征融合方法在CMIR任务中更有效。

著录项

来源
《International Conference on Computational Science》|2019年|131-144|共14页
会议地点
作者
Jingjing Guo; Jing Yu; Yuhang Lu; Yue Hu; Yanbing Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
2D-convolutional network; Inner-group relationship; Feature fusion; Cross-modal correlation; Cross-modal information retrieval;

机译：2D卷积网络内在关系特征融合;跨模态相关跨模式信息检索;

相似文献

外文文献
中文文献
专利

1. Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval [J] . Xu Yuan, Hua Zhong, Zhikui Chen, International journal of grid and high performance computing . 2018,第3期

机译：跨模态检索的多媒体特征映射和相关学习
2. Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval [J] . Pattern Recognition: The Journal of the Pattern Recognition Society . 2020,第期

机译：基于细粒草图的图像检索的深层级联跨模态相关学习
3. Learning discriminative hashing codes for cross-modal retrieval based on multi-view features [J] . Yu Jun, Wu Xiao-Jun, Kittler Josef Pattern Analysis and Applications . 2020,第3期

机译：基于多视图功能的跨模态检索学习辨别性散列码
4. 2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning [C] . Jingjing Guo, Jing Yu, Yuhang Lu, International Conference on Computational Science . 2019

机译：基于卷积的跨模型相关学习的特征融合
5. Robust high range resolution radar target identification using a statistical feature-based classifier with feature level fusion [D] . Mitchell, Richard Allen 1997

机译：使用基于统计的基于特征的融合分类器的分类器进行高分辨力的高分辨率雷达目标识别
6. A Classification Method for the Cellular Images Based on Active Learning and Cross-Modal Transfer Learning [O] . Caleb Vununu, Suk-Hwan Lee, Ki-Ryong Kwon 2021

机译：基于主动学习和跨模型转移学习的蜂窝图像的分类方法
7. CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network [O] . Peng, Yuxin, Qi, Jinwei, Huang, Xin, 2017

机译：CCL：多模式融合的跨模态相关学习分层网络

2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning

摘要

著录项

相似文献

相关主题

期刊订阅