MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism

Peng Yuxin; Qi Jinwei; Zhuo Yunkan

首页> 外文期刊>IEEE Transactions on Image Processing >MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism

【24h】

MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism

机译：Mava：跨媒体双关注机制的多级自适应视觉校准

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The rapidly developing information technology leads to a fast growth of visual and textual contents, and it comes with huge challenges to make correlation and perform cross-media retrieval between images and sentences. Existing methods mainly explore cross-media correlation from either global-level instances as the whole images and sentences, or local-level fine-grained patches as the discriminative image regions and key words, which ignore the complementary information from the relation between local-level fine-grained patches. Naturally, relation understanding is highly important for learning cross-media correlation. People focus on not only the alignment between discriminative image regions and key words, but also their relations lying in the visual and textual context. Therefore, in this paper, we propose Multi-level Adaptive Visual-textual Alignment (MAVA) approach with the following contributions. First, we propose cross-media multi-pathway fine-grained network to extract not only the local fine-grained patches as discriminative image regions and key words, but also visual relations between image regions as well as textual relations from the context of sentences, which contain complementary information to exploit fine-grained characteristics within different media types. Second, we propose visual-textual bi-attention mechanism to distinguish the fine-grained information with different saliency from both local and relation levels, which can provide more discriminative hints for correlation learning. Third, we propose cross-media multi-level adaptive alignment to explore global, local and relation alignments. An adaptive alignment strategy is further proposed to enhance the matched pairs of different media types, and discard those misalignments adaptively to learn more precise cross-media correlation. Extensive experiments are conducted to perform image-sentence matching on 2 widely-used cross-media datasets, namely Flickr-30K and MS-COCO, comparing with 10 state-of-the-art methods, which can fully verify the effectiveness of our proposed MAVA approach.

机译：快速发展的信息技术导致视觉和文本内容的快速增长，并具有巨大的挑战来进行相关性并在图像和句子之间执行跨媒检索。现有方法主要探讨从全局级别实例作为整个图像和句子的交叉媒体关联，或作为判别图像区域和关键词的本地级别细粒度贴片，忽略本地级之间的关系的互补信息细粒度斑块。当然，关系理解对于学习跨媒相关性非常重要。人们不仅专注于鉴别性图像区域与关键词之间的对齐，而且还专注于他们的关系伴随着视觉和文本背景。因此，在本文中，我们提出了具有以下贡献的多级自适应视觉校准（Mava）方法。首先，我们提出跨媒体多通路细粒网络，不仅提取局部细粒度斑块作为鉴别的图像区域和关键词，而且从句子的背景下的图像区域和文本关系之间的视觉关系，其中包含互补信息，以利用不同媒体类型的细粒度特征。其次，我们提出了视觉文本的双关注机制，以将微粒信息与局部和关系水平不同，这可以为相关学习提供更多的辨别暗示。第三，我们提出跨媒体多级自适应对齐来探索全局，本地和关系对齐。进一步提出了一种自适应对准策略来增强匹配的不同媒体类型对，并自适应地丢弃这些未对准以学习更精确的跨媒体相关性。进行广泛的实验，以在2个广泛使用的跨媒体数据集上进行图像句，即FlickR-30K和MS-Coco，与10个最先进的方法相比，这可以完全验证我们提出的效果Mava方法。

著录项

来源
《IEEE Transactions on Image Processing》 |2020年第2020期|2728-2741|共14页
作者
Peng Yuxin; Qi Jinwei; Zhuo Yunkan;
展开▼
作者单位

Peking Univ Wangxuan Inst Comp Technol Beijing 100871 Peoples R China;

Peking Univ Wangxuan Inst Comp Technol Beijing 100871 Peoples R China;

Peking Univ Wangxuan Inst Comp Technol Beijing 100871 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Correlation; Visualization; Media; Semantics; Deep learning; Adaptation models; Games; Cross-media multi-pathway fine-grained network; visual-textual bi-attention mechanism; cross-media multi-level adaptive alignment;

机译：相关性;可视化;媒体;语义;深度学习;适应模型;游戏;跨媒体多通路细粒度网络;视觉文本的双关注机制;跨媒体多级自适应对齐;

相似文献

外文文献
中文文献
专利

1. A multi-level model on automated vehicle acceptance (MAVA): a review-based study [J] . Sina Nordhoff, Miltos Kyriakidis, Bart van Arem, Theoretical Issues in Ergonomics Science . 2019,第6期

机译：汽车自动验收（MAVA）的多层次模型：基于审查的研究
2. Visual-textual sentiment classification with bi-directional multi-level attention networks [J] . Xu Jie, Huang Feiran, Zhang Xiaoming, Knowledge-Based Systems . 2019,第AUGa15期

机译：双向多层次注意力网络的视文本情感分类
3. Visual-textual sentiment classification with bi-directional multi-level attention networks [J] . Xu Jie, Huang Feiran, Zhang Xiaoming, Knowledge-Based Systems . 2019,第Auga15期

机译：具有双向多级关注网络的视觉文本情绪分类
4. Cross-media Multi-level Alignment with Relation Attention Network [C] . Jinwei Qi, Yuxin Peng, Yuxin Yuan International Joint Conference on Artificial Intelligence . 2018

机译：与关系注意网络的跨媒体多级对齐
5. Robust integration of multi-level fault detection mechanisms and recovery mechanisms in a component-based support middleware model for fault-tolerant real-time distributed computing. [D] . Zhou, Qian. 2009

机译：多级故障检测机制和恢复机制在基于组件的支持中间件模型中的可靠集成，用于容错实时分布式计算。
6. Computing Multi-Level Clustered Alignments of Gene-Expression Time-Series [O] . Mark Craven, Deborah Muganda-Rippchen 2014

机译：计算基因表达时间序列的多级聚类比对
7. Cross-media Multi-level Alignment with Relation Attention Network [O] . Jinwei Qi, Yuxin Peng, Yuxin Yuan 2018

机译：与关系注意网络的跨媒体多级对齐

MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅