SPCA-Net: a based on spatial position relationship co-attention network for visual question answering

Feng Yan; Wushouer Silamu; Yanbin LiYachuang Chai

首页> 外文期刊>The visual computer >SPCA-Net: a based on spatial position relationship co-attention network for visual question answering

【24h】

SPCA-Net: a based on spatial position relationship co-attention network for visual question answering

机译：SPCA-Net: a based on spatial position relationship co-attention network for visual question answering

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Abstract Recently, the latest method of VQA (visual question answering) mainly relies on the co-attention to link each visual object with the text object, which can achieve a rough interaction between multiple models. However, VQA models tend to focus on the association between visual and language features without considering the spatial relationship between image region features extracted by Faster R-CNN. This paper proposes an effective deep co-attention network to solve this problem. As a first step, BERT was introduced in order to better capture the relationship between words and make the extracted text feature more robust; secondly, a multimodal co-attention based on spatial location relationship was proposed in order to realize fine-grained interactions between question and image. It consists of three basic components: the text self-attention unit, the image self-attention unit, and the question-guided-attention unit. The self-attention mechanism of image visual features integrates information about the spatial position and width/height of the image area after obtaining attention so that each image area is aware of the relative location and size of other areas. Our experiment results indicate that our model is significantly better than other existing models.

著录项

来源
《The visual computer》 |2022年第10期|3097-3108|共12页
作者
Feng Yan; Wushouer Silamu; Yanbin LiYachuang Chai;
展开▼
作者单位

Xinjiang University;

Laboratory of Multi-Lingual Information Technology of Xinjiang;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类
关键词
BERT; Guided-attention; Self-attention; Faster R-CNN; Spatial position relationship;

SPCA-Net: a based on spatial position relationship co-attention network for visual question answering

摘要

著录项

相关主题

期刊订阅