CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

Peng Yuxin; Qi Jinwei

首页> 外文期刊>ACM transactions on multimedia computing communications and applications >CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

【24h】

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

机译：CM-GAN：用于共同表示学习的跨模式生成对抗网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

It is known that the inconsistent distributions and representations of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate heterogeneous data and measure their similarities. Recently, generative adversarial networks (GANs) have been proposed and have shown their strong ability to model data distribution and learn discriminative representation. It has also been shown that adversarial learning can be fully exploited to learn discriminative common representations for bridging the heterogeneity gap. Inspired by this, we aim to effectively correlate large-scale heterogeneous data of different modalities with the power of GANs to model cross-modal joint distribution. In this article, we propose Cross-modal Generative Adversarial Networks (CM-GANs) with the following contributions. First, a cross-modal GAN architecture is proposed to model joint distribution over the data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both compete with each other to promote cross-modal correlation learning. Second, the cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model They not only exploit the cross-modal correlation for learning the common representations but also preserve reconstruction information for capturing the semantic consistency within each modality. Third, a cross-modal adversarial training mechanism is proposed, which uses two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representations more discriminative by the adversarial training process. In summary, our proposed CM-GAN approach can use GANs to perform cross-modal common representation learning by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of CM-GANs on cross-modal retrieval compared with 13 state-of-the-art methods on 4 cross-modal datasets.

机译：众所周知，不同形式（例如图像和文本）的不一致分布和表示形式会导致异质性差距，这使得关联异质数据并测量其相似性非常具有挑战性。最近，已经提出了生成对抗网络（GAN），并显示出其强大的建模数据分布和学习判别表示的能力。还显示出可以充分利用对抗学习来学习区分性的通用表示形式，以弥合异质性差距。受此启发，我们的目标是有效地将不同模式的大规模异构数据与GAN的能力相关联，以建模跨模式联合分布。在本文中，我们提出了具有以下贡献的跨模式生成对抗网络（CM-GAN）。首先，提出了一种跨模式GAN架构，以对不同模式的数据上的联合分布进行建模。模态之间和模态之间的相关性可以在生成模型和判别模型中同时探索。两者相互竞争以促进跨模式相关学习。其次，提出了具有权重共享约束的跨模态卷积自编码器，以形成生成模型。它们不仅利用跨模态相关性学习通用表示，而且保留重构信息以捕获每个模态内的语义一致性。第三，提出了一种跨模式的对抗训练机制，该机制利用两种判别模型同时进行模态内和模态间的区分。它们可以相互促进，以通过对抗训练过程使生成的共同表示更具区分性。总而言之，我们提出的CM-GAN方法可以使用GAN来执行跨模式通用表示学习，通过该学习可以有效地关联异构数据。进行了广泛的实验，以验证CM-GAN在交叉模式检索中的性能，与在4个交叉模式数据集上使用的13种最新方法相比。

著录项

来源
《ACM transactions on multimedia computing communications and applications》 |2019年第1期|22.1-22.24|共24页
作者
Peng Yuxin; Qi Jinwei;
展开▼
作者单位

Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China;

Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Generative adversarial network; cross-modal adversarial mechanism; common representation learning; cross-modal retrieval;

机译：生成对抗网络跨模式对抗机制通用表示学习跨模式检索;

相似文献

外文文献
中文文献
专利

1. Learning cross-modal visual-tactile representation using ensembled generative adversarial networks [J] . Xinwu Li, Huaping Liu, Junfeng Zhou, Cognitive Computation and Systems . 2019,第2期

机译：使用集成的生成对抗网络学习跨模态视觉触觉表示
2. Common Semantic Representation Method Based on Object Attention and Adversarial Learning for Cross-Modal Data in IoV [J] . IEEE Transactions on Vehicular Technology . 2019,第12期

机译：IoV中基于对象注意力和对抗学习的通用语义表示方法
3. Local and non-local dependency learning and emergence of rule-like representations in speech data by deep convolutional generative adversarial networks [J] . Gasper Begus Computer speech and language . 2022,第Jana期

机译：深度卷积生成对冲网络，局部和非本地依赖学习和语音数据中的规则样式的出现
4. Unsupervised Cross-Modal Retrieval by Coupled Dual Generative Adversarial Networks [C] . Jingzi Gu, Peng Fu, Jinchao Zhang, Aisa-Pacific web and web-age information management joint conference on web and big data . 2020

机译：耦合双生成对抗网络无监督的交叉模态检索
5. Stacked Generative Adversarial Networks for Learning Additional Features of Image Segmentation Maps [D] . Burke, Matthew. 2020

机译：用于学习图像分割图的其他特征的堆叠生成的对抗网络
6. Information-Based Boundary Equilibrium Generative Adversarial Networks with Interpretable Representation Learning [O] . Junghoon Hah, Woojin Lee, Jaewook Lee, 2018

机译：具有可解释性表示学习的基于信息的边界均衡生成对抗网络
7. Local and non-local dependency learning and emergence of rule-like representations in speech data by Deep Convolutional Generative Adversarial Networks [O] . Gašper Beguš 2021

机译：深度卷积生成对抗网络的语音数据中的本地和非本地依赖学习和局限性的出现

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅