Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

【24h】

Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

机译：视觉反问题解答：一种新的基准和VQA诊断工具

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps 'understand' less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution [1] . In this paper we propose the inverse problem of VQA (iVQA). The iVQA task is to generate a question that corresponds to a given image and answer pair. We propose a variational iVQA model that can generate diverse, grammatically correct and content correlated questions that match the given answer. Based on this model, we show that iVQA is an interesting benchmark for visuo-linguistic understanding, and a more challenging alternative to VQA because an iVQA model needs to understand the image better to be successful. As a second contribution, we show how to use iVQA in a novel reinforcement learning framework to diagnose any existing VQA model by way of exposing its belief set: the set of question-answer pairs that the VQA model would predict true for a given image. This provides a completely new window into what VQA models 'believe' about images. We show that existing VQA models have more erroneous beliefs than previously thought, revealing their intrinsic weaknesses. Suggestions are then made on how to address these weaknesses going forward.

机译：近年来，视觉问答（VQA）已成为热门话题。 VQA作为AI基准的重要性的前提是，图像和文本问题都需要被很好地理解并相互扎根，才能推断出正确的答案。但是，当前的VQA模型可能“理解”的程度比最初希望的要小，而是掌握了更轻松的任务，即利用问题中给出的提示和答案分布中的偏见[1]。在本文中，我们提出了VQA的反问题（iVQA）。 iVQA的任务是生成与给定图像和答案对相对应的问题。我们提出了一种变体iVQA模型，该模型可以生成与给定答案匹配的各种语法正确且内容相关的问题。基于此模型，我们表明iVQA是视觉语言理解的有趣基准，并且是VQA更具挑战性的替代方案，因为iVQA模型需要更好地理解图像才能成功。作为第二个贡献，我们展示了如何在新颖的强化学习框架中使用iVQA，以通过公开其信念集来诊断任何现有的VQA模型：VQA模型将对给定图像预测为真的一组问题-答案对。这为VQA模型“相信”图像提供了一个全新的窗口。我们表明，现有的VQA模型比以前认为的具有更多错误的信念，从而揭示了它们的固有弱点。然后就如何解决这些弱点提出了建议。

著录项

来源
《IEEE Transactions on Pattern Analysis and Machine Intelligence》 |2020年第2期|460-474|共15页
作者

展开▼
作者单位

Southeast Univ Sch Automat Nanjing 210096 Peoples R China;

Queen Mary Univ London Sch Elect Engn & Comp Sci Comp Vis & Multimedia London E1 4NS England;

Univ Edinburgh Sch Informat IPAB 10 Crichton St Edinburgh EH8 9AB Midlothian Scotland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Benchmark testing; Visualization; Predictive models; Analytical models; Image color analysis; Knowledge discovery; Task analysis; Inverse visual question answering; VQA visualisation; visuo-linguistic understanding; reinforcement learning;

机译：基准测试;可视化;预测模型;分析模型;图像色彩分析;知识发现;任务分析;逆向视觉问题解答;VQA可视化;视觉语言理解;强化学习;

相似文献

外文文献
中文文献
专利

1. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering [J] . Pan Lu, Lei Ji, Wei Zhang, SIGKDD explorations . 2018,第Udisk期

机译：R-VQA：学习具有语义关注的视觉关系事实，用于视觉问题应答
2. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering [J] . Goyal Yash, Khot Tejas, Agrawal Aishwarya, International Journal of Computer Vision . 2019,第4期

机译：在VQA问题中制作v：提升图像理解在视觉问题的回答中的作用
3. Inverse Visual Question Answering with Multi-Level Attentions [J] . Yaser Alwatter, Yuhong Guo JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：逆视觉问题与多级关注回答
4. CQ-VQA: Visual Question Answering on Categorized Questions [C] . Aakansha Mishra, Ashish Anand, Prithwijit Guha International Joint Conference on Neural Networks . 2020

机译：CQ-VQA：关于分类问题的视觉问题解答
5. Context Based Multi-Image Visual Question Answering (VQA) in Deep Learning [D] . Peddinti, Sudhakar Reddy. 2018

机译：深度学习中基于上下文的多图像视觉问答（VQA）
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool [O] . Feng Liu, Tao Xiang, Timothy M. Hospedales, 2020

机译：逆视觉问题应答：新的基准和VQA诊断工具

Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅