Explaining VQA predictions using visual grounding and a knowledge base update

Riquelme Felipe; De Goyeneche Alfredo; Zhang Yundong; Niebles Juan Carlos; Soto Alvaro

首页> 外文期刊>Image and Vision Computing >Explaining VQA predictions using visual grounding and a knowledge base update

【24h】

Explaining VQA predictions using visual grounding and a knowledge base update

机译：使用视觉接地和知识库更新解释VQA预测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work, we focus on the Visual Question Answering (VQA) task, where a model must answer a question based on an image, and the VQA-Explanations task, where an explanation is produced to support the answer. We introduce an interpretable model capable of pointing out and consuming information from a novel Knowledge Base (KB) composed of real-world relationships between objects, along with labels mined from available region descriptions and object annotations. Furthermore, this model provides a visual and textual explanations to complement the KB visualization. The use of a KB brings two important consequences: enhance predictions and improve interpretability. We achieve this by introducing a mechanism that can extract relevant information from this KB, and can point out the relations better suited for predicting the answer. A supervised attention map is generated over the KB to select the relevant relationships from it for each question-image pair. Moreover, we add image attention supervision on the explanations module to generate better visual and textual explanations. We quantitatively show that the predicted answers improve when using the KB; similarly, explanations improve with this and when adding image attention supervision. Also, we qualitatively show that the KB attention helps to improve interpretability and enhance explanations. Overall, the results support the benefits of having multiple tasks to enhance the interpretability and performance of the model. (C) 2020 Elsevier B.V. All rights reserved.

机译：在这项工作中，我们专注于视觉问题应答（VQA）任务，其中模型必须基于图像回答问题，而VQA-解释任务，则会产生解释以支持答案。我们介绍了一种可解释的模型，能够从物体之间的真实关系组成的新颖知识库（KB）指出和消耗信息，以及从可用区域描述和对象注释中开采的标签。此外，该模型提供了可视化和文本解释，以补充KB可视化。使用KB带来了两个重要的后果：增强预测，提高解释性。我们通过引入可以从该KB提取相关信息的机制来实现这一目标，并且可以指出更适合预测答案的关系。在KB上生成了监督的注意图，以为每个问题图像对中选择与其相关的相关关系。此外，我们在解释模块上添加图像注意力监控，以产生更好的视觉和文本解释。我们定量地表明预测答案在使用KB时改善;同样，解释改善了这个以及添加图像注意监督。此外，我们定性表明KB的注意力有助于提高可解释性和增强解释。总体而言，结果支持拥有多项任务以增强模型的可解释性和性能的益处。（c）2020 Elsevier B.v.保留所有权利。

著录项

来源
《Image and Vision Computing》 |2020年第9期|103968.1-103968.12|共12页
作者
Riquelme Felipe; De Goyeneche Alfredo; Zhang Yundong; Niebles Juan Carlos; Soto Alvaro;
展开▼
作者单位

Pontificia Univ Catolica Chile Santiago Chile;

Pontificia Univ Catolica Chile Santiago Chile;

Stanford Univ Stanford CA 94305 USA;

Stanford Univ Stanford CA 94305 USA;

Pontificia Univ Catolica Chile Santiago Chile;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep Learning; Attention; Supervision; Knowledge Base; Interpretability; Explainability;

机译：深入学习;注意;监督;知识库;解释性;解释性;

相似文献

外文文献
中文文献
专利

1. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering [J] . Pan Lu, Lei Ji, Wei Zhang, SIGKDD explorations . 2018,第Udisk期

机译：R-VQA：学习具有语义关注的视觉关系事实，用于视觉问题应答
2. Grounding humanoid visually guided walking: From action-independent to action-oriented knowledge [J] . Chame Hendry Ferreira, Chevallereau Christine Information Sciences: An International Journal . 2016,第Null期

机译：扎实的人形视觉引导步行：从独立于行动的知识到面向行动的知识
3. Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool [J] . IEEE Transactions on Pattern Analysis and Machine Intelligence . 2020,第2期

机译：视觉反问题解答：一种新的基准和VQA诊断工具
4. Heatmaps for Visual Explainability of CNN-Based Predictions for Multivariate Time Series with Application to Healthcare [C] . Fabien Viton, Mahmoud Elbattah, Jean-Luc Guérin, IEEE International Conference on Healthcare Informatics . 2020

机译：用于基于CNN的多变量时间序列的可视化解释性的热量：与医疗保健的基于CNN的预测
5. Context Based Multi-Image Visual Question Answering (VQA) in Deep Learning [D] . Peddinti, Sudhakar Reddy. 2018

机译：深度学习中基于上下文的多图像视觉问答（VQA）
6. Explainable Prediction of Medical Codes With Knowledge Graphs [O] . Fei Teng, Wei Yang, Li Chen, 2020

机译：用知识图形解释对医学代码的预测
7. VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions [O] . Qing Li, Qingyi Tao, Shafiq Joty, 2018

机译：VQA-E：解释，详细说明和增强您的视觉问题的答案

Explaining VQA predictions using visual grounding and a knowledge base update

摘要

著录项

相似文献

相关主题

期刊订阅