首页> 外文会议>European conference on computer vision >Revisiting Visual Question Answering Baselines
【24h】

Revisiting Visual Question Answering Baselines

机译:重新审视视觉问题的回答基准

获取原文

摘要

Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to perform "reasoning". Furthermore, for the task of multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict an answer. This paper questions the value of these common practices and develops a simple alternative model based on binary classification. Instead of treating answers as competing choices, our model receives the answer as input and predicts whether or not an image-question-answer triplet is correct. We evaluate our model on the Visual7W Telling and the VQA Real Multiple Choice tasks, and find that even simple versions of our model perform competitively. Our best model achieves state-of-the-art performance of 65.8% accuracy on the Visual7W Telling task and compares surprisingly well with the most complex systems proposed for the VQA Real Multiple Choice task. Additionally, we explore variants of the model and study the transferability of the model between both datasets. We also present an error analysis of our best model, the results of which suggest that a key problem of current VQA systems lies in the lack of visual grounding and localization of concepts that occur in the questions and answers.
机译:视觉问题应答(VQA)是评估当前系统的图像理解的能力和缺点的有趣学习设置。许多最近提出的VQA系统包括旨在执行“推理”的注意力或内存机制。此外,对于多项选择VQA的任务,几乎所有这些系统都在图像和问题特征上培训多级分类器以预测答案。本文提出了这些常见实践的价值,并根据二进制分类开发一个简单的替代模型。我们的模型而不是将答案视为竞争选择,而是将答案接收为输入并预测图像问题答案三联是正确的。我们在Visual7W讲台和VQA实际多项选择任务上评估我们的模型,并发现我们的模型的简单版本竞争性。我们最好的模型在Visual7W讲述任务上实现了最先进的表现,精度为65.8%,并比较了对VQA真实多项选择任务的最复杂的系统令人惊讶的良好。此外,我们探索模型的变体,并研究两个数据集之间模型的可转换性。我们还出现了对我们最好的模型的错误分析,结果表明,当前的VQA系统的关键问题在于问题和答案中出现的概念的缺乏视觉接地和本地化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号