首页> 外文会议>European conference on computer vision >Revisiting Visual Question Answering Baselines
【24h】

Revisiting Visual Question Answering Baselines

机译:重新审视视觉问答基准

获取原文

摘要

Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to perform "reasoning". Furthermore, for the task of multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict an answer. This paper questions the value of these common practices and develops a simple alternative model based on binary classification. Instead of treating answers as competing choices, our model receives the answer as input and predicts whether or not an image-question-answer triplet is correct. We evaluate our model on the Visual7W Telling and the VQA Real Multiple Choice tasks, and find that even simple versions of our model perform competitively. Our best model achieves state-of-the-art performance of 65.8% accuracy on the Visual7W Telling task and compares surprisingly well with the most complex systems proposed for the VQA Real Multiple Choice task. Additionally, we explore variants of the model and study the transferability of the model between both datasets. We also present an error analysis of our best model, the results of which suggest that a key problem of current VQA systems lies in the lack of visual grounding and localization of concepts that occur in the questions and answers.
机译:视觉问答(VQA)是一种有趣的学习设置,用于评估当前图像理解系统的功能和不足。最近提出的许多VQA系统都包含设计用于执行“推理”的注意力或记忆机制。此外,对于多项选择VQA的任务,几乎所有这些系统都在图像和问题特征方面训练了多分类器以预测答案。本文质疑这些常规做法的价值,并基于二元分类建立一个简单的替代模型。我们的模型不是将答案视为竞争选择,而是将答案作为输入,并预测图像问题答案三元组是否正确。我们在Visual7W Telling和VQA真正的多项选择任务上评估了我们的模型,发现即使简单的模型版本也具有竞争力。我们的最佳模型在Visual7W Telling任务上实现了65.8%的最新精度,并且与为VQA真正多项选择任务建议的最复杂的系统相比令人惊讶地出色。此外,我们探索了模型的变体,并研究了模型在两个数据集之间的可转移性。我们还提供了对最佳模型的错误分析,其结果表明,当前VQA系统的关键问题在于问题和答案中缺少可视化的基础和概念的本地化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号