首页> 外文期刊>Image and Vision Computing >Explaining VQA predictions using visual grounding and a knowledge base update
【24h】

Explaining VQA predictions using visual grounding and a knowledge base update

机译:使用视觉接地和知识库更新解释VQA预测

获取原文
获取原文并翻译 | 示例
           

摘要

In this work, we focus on the Visual Question Answering (VQA) task, where a model must answer a question based on an image, and the VQA-Explanations task, where an explanation is produced to support the answer. We introduce an interpretable model capable of pointing out and consuming information from a novel Knowledge Base (KB) composed of real-world relationships between objects, along with labels mined from available region descriptions and object annotations. Furthermore, this model provides a visual and textual explanations to complement the KB visualization. The use of a KB brings two important consequences: enhance predictions and improve interpretability. We achieve this by introducing a mechanism that can extract relevant information from this KB, and can point out the relations better suited for predicting the answer. A supervised attention map is generated over the KB to select the relevant relationships from it for each question-image pair. Moreover, we add image attention supervision on the explanations module to generate better visual and textual explanations. We quantitatively show that the predicted answers improve when using the KB; similarly, explanations improve with this and when adding image attention supervision. Also, we qualitatively show that the KB attention helps to improve interpretability and enhance explanations. Overall, the results support the benefits of having multiple tasks to enhance the interpretability and performance of the model. (C) 2020 Elsevier B.V. All rights reserved.
机译:在这项工作中,我们专注于视觉问题应答(VQA)任务,其中模型必须基于图像回答问题,而VQA-解释任务,则会产生解释以支持答案。我们介绍了一种可解释的模型,能够从物体之间的真实关系组成的新颖知识库(KB)指出和消耗信息,以及从可用区域描述和对象注释中开采的标签。此外,该模型提供了可视化和文本解释,以补充KB可视化。使用KB带来了两个重要的后果:增强预测,提高解释性。我们通过引入可以从该KB提取相关信息的机制来实现这一目标,并且可以指出更适合预测答案的关系。在KB上生成了监督的注意图,以为每个问题图像对中选择与其相关的相关关系。此外,我们在解释模块上添加图像注意力监控,以产生更好的视觉和文本解释。我们定量地表明预测答案在使用KB时改善;同样,解释改善了这个以及添加图像注意监督。此外,我们定性表明KB的注意力有助于提高可解释性和增强解释。总体而言,结果支持拥有多项任务以增强模型的可解释性和性能的益处。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号