首页> 外文会议>International Conference on Pattern Recognition >Answer-checking in Context: A Multi-modal Fully Attention Network for Visual Question Answering

【24h】

Answer-checking in Context: A Multi-modal Fully Attention Network for Visual Question Answering

机译：在上下文中回答 - 检查：用于视觉问题的多模态完全注意网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual Question Answering (VQA) is challenging due to the complex cross-modal relations. It has received extensive attention from the research community. From the human perspective, to answer a visual question, one needs to read the question and then refer to the image to generate an answer. This answer will then be checked against the question and image again for the final confirmation. In this paper, we mimic this process and propose a fully attention based VQA architecture. Moreover, an answer-checking module is proposed to perform a unified attention on the jointly answer, question and image representation to update the answer. This mimics the human answer checking process to consider the answer in the context. With answer-checking modules and transferred BERT layers, our model achieves the state-of-the-art accuracy 71.57% using fewer parameters on VQA-v2.0 test-standard split.

机译：由于复杂的跨模式关系，视觉问题应答（VQA）是挑战。它受到了研究界的广泛关注。从人类的角度来看，要回答视觉问题，人们需要阅读问题，然后引用图像以生成答案。然后将再次检查此答案并再次检查问题和图像进行最终确认。在本文中，我们模仿此过程并提出了完全关注的VQA架构。此外，提出了一个答案检查模块，以在联合答案，问题和图像表示上执行统一的注意，以更新答案。这模仿人类答案检查过程以考虑上下文中的答案。通过答案检查模块和转移伯特层，我们的车型使用更少的参数在VQA-V2.0测试标准分割上实现最先进的准确度71.57％。

著录项

来源
《International Conference on Pattern Recognition 》|2021年|1173-1180|共8页
会议地点
作者
Hantao Huang; Tao Han; Wei Han; Deep Yap; Cheng-Ming Chiang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Bit error rate; Image representation; Knowledge discovery; Pattern recognition;

机译：可视化;误码率;图像表示;知识发现;模式识别;

相似文献

外文文献
中文文献
专利

1. Hierarchical deep multi-modal network for medical visual question answering [J] . Deepak Gupta, Swati Suman, Asif Ekbal Expert systems with applications . 2021 ,第Feba期

机译：用于医学视觉问题的分层深层多模态网络
2. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021 ,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
3. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? [J] . Abhishek Das, Harsh Agrawal, Larry Zitnick, Computer vision and image understanding . 2017 ,第octa期

机译：视觉问题解答中的人类注意力：人类和深层网络是否看待同一地区？
4. Visual Question Answering Combining Multi-modal Feature Fusion and Multi-Attention Mechanism [C] . Cai Linqin, Liao Zhongxu, Zhou Sitong, International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering . 2021

机译：视觉问题应答组合多模态特征融合和多关注机制
5. Attention Correction Mechanisms in Visual Contexts in Visual Question Answering [D] . Sharan, Komal 2018

机译：视觉问答中视觉上下文中的注意力纠正机制
6. Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering [O] . Zihan Guo, Dezhi Han 2020

机译：用于视觉问题的多模态显式稀疏关注网络
7. Answer-checking in Context: A Multi-modal Fully Attention Network for Visual Question Answering [O] . Hantao Huang, Tao Han, Wei Han, 2021

机译：在上下文中回答检查：用于视觉问题的多模态完全注意网络
8. Questions and Answers on Quality, the ISO 9000 Standard Series, Quality SystemRegistration, and Related Issues. More Questions and Answers on the ISO 9000 Standard Series and Related Issues [R] . Breitenberg, M. 1993

机译：有关质量的问题和解答，IsO 9000标准系列，质量体系注册和相关问题。有关IsO 9000标准系列及相关问题的更多问题和解答

Answer-checking in Context: A Multi-modal Fully Attention Network for Visual Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅