UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

机译：UNIMO：通过跨模态对比学习实现统一模态理解和生成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Existed pre-Iraining methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. They can only utilize single-modal data (i.e., text or image) or limited multi-modal data (i.e., image-text pairs). In this work, we propose a UNItied-MOdal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. Large scale of free text corpus and image collections are utilized to improve the capability of visual and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified semantic space, over a corpus of image-text pairs augmented with related images and texts. With the help of rich non-paired single-modal data, our model is able to learn more generalizable representations, by allowing textual knowledge and visual knowledge to enhance each other in the unified semantic space. The experimental results show that UNIMO greatly improves the performance of several single-modal and multi-modal downstream tasks.

机译：现有的预激励方法要么侧重于单一模态任务，要么侧重于多模态任务，不能有效地相互适应。它们只能利用单一模式数据（即文本或图像）或有限的多模式数据（即图像-文本对）。在这项工作中，我们提出了一个统一的模态预训练体系结构，即UNIMO，它可以有效地适应单模态和多模态的理解和生成任务。大规模的自由文本语料库和图像集合被用来提高视觉和文本理解能力，跨模态对比学习（CMCL）被用来将文本和视觉信息整合到一个统一的语义空间中，在一个图像-文本对语料库中加入相关的图像和文本。借助于丰富的非配对单模态数据，我们的模型能够通过允许文本知识和视觉知识在统一的语义空间中相互增强来学习更多的可概括表示。实验结果表明，UNIMO极大地提高了多个单峰和多峰下游任务的性能。

著录项

来源
《Annual Meeting of the Association for Computational Linguistics;International Joint Conference on natural Language Processing》|2021年|2592-2607|共16页
会议地点
作者
Wei Li; Can Gao; Guocheng Niu; Xinyan Xiao; Hao Liu; Jiachen Liu; Hua Wu; Haifeng Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Deep discriminative image feature learning for cross-modal semantics understanding [J] . Zhang Hong, Liu Fangming, Li Bo, Knowledge-Based Systems . 2021,第Mara15期

机译：跨模型语义理解的深度鉴别性图像特征学习
2. Image understanding via learning weakly-supervised cross-modal semantic translation [J] . Shen Guorong Journal of visual communication & image representation . 2020,第Auga期

机译：图像理解通过学习虚弱监督的十二态语义翻译
3. Understanding the Generation Z Behavior on D-Learning: A Unified Theory of Acceptance and Use of Technology (UTAUT) Approach [J] . Satria Fadil Persada, Bobby Ardiansyah Miraja, Reny Nadlifatin International Journal of Emerging Technologies in Learning (iJET) . 2019,第5期

机译：了解D学习中的Z代行为：接受和使用技术的统一理论（UTAUT）方法
4. Learning Audio-Visual Correlations From Variational Cross-Modal Generation [C] . Ye Zhu, Yu Wu, Hugo Latapie, IEEE International Conference on Acoustics, Speech and Signal Processing . 2021

机译：从变分跨模型生成学习视听相关性
5. Improving Natural Language Understanding via Contrastive Learning Methods [D] . Cheng, Pengyu. 2021

机译：通过对比学习方法提高自然语言理解
6. Machine-learning based segmentation of the optic nerve head using multi-contrast Jones matrix optical coherence tomography with semi-automatic training dataset generation [O] . Deepa Kasaragod, Shuichi Makita, Young-Joo Hong, 2018

机译：使用多对比度琼斯矩阵光学相干断层扫描和半自动训练数据集生成的基于机器学习的视神经头部分割
7. Enriched Music Representations With Multiple Cross-Modal Contrastive Learning [O] . Andres Ferraro, Xavier Favory, Konstantinos Drossos, 2021

机译：丰富的音乐表示，具有多种跨模型对比学习
8. Unsupervised Classification Learning from Cross-Modal Environmental Structure [R] . Desa, V. R. 1994

机译：跨模态环境结构的无监督分类学习

UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

摘要

著录项

相似文献

相关主题

期刊订阅