首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
【24h】

Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

机译:视觉反问题解答:一种新的基准和VQA诊断工具

获取原文
获取原文并翻译 | 示例

摘要

In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps 'understand' less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution [1] . In this paper we propose the inverse problem of VQA (iVQA). The iVQA task is to generate a question that corresponds to a given image and answer pair. We propose a variational iVQA model that can generate diverse, grammatically correct and content correlated questions that match the given answer. Based on this model, we show that iVQA is an interesting benchmark for visuo-linguistic understanding, and a more challenging alternative to VQA because an iVQA model needs to understand the image better to be successful. As a second contribution, we show how to use iVQA in a novel reinforcement learning framework to diagnose any existing VQA model by way of exposing its belief set: the set of question-answer pairs that the VQA model would predict true for a given image. This provides a completely new window into what VQA models 'believe' about images. We show that existing VQA models have more erroneous beliefs than previously thought, revealing their intrinsic weaknesses. Suggestions are then made on how to address these weaknesses going forward.
机译:近年来,视觉问答(VQA)已成为热门话题。 VQA作为AI基准的重要性的前提是,图像和文本问题都需要被很好地理解并相互扎根,才能推断出正确的答案。但是,当前的VQA模型可能“理解”的程度比最初希望的要小,而是掌握了更轻松的任务,即利用问题中给出的提示和答案分布中的偏见[1]。在本文中,我们提出了VQA的反问题(iVQA)。 iVQA的任务是生成与给定图像和答案对相对应的问题。我们提出了一种变体iVQA模型,该模型可以生成与给定答案匹配的各种语法正确且内容相关的问题。基于此模型,我们表明iVQA是视觉语言理解的有趣基准,并且是VQA更具挑战性的替代方案,因为iVQA模型需要更好地理解图像才能成功。作为第二个贡献,我们展示了如何在新颖的强化学习框架中使用iVQA,以通过公开其信念集来诊断任何现有的VQA模型:VQA模型将对给定图像预测为真的一组问题-答案对。这为VQA模型“相信”图像提供了一个全新的窗口。我们表明,现有的VQA模型比以前认为的具有更多错误的信念,从而揭示了它们的固有弱点。然后就如何解决这些弱点提出了建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号