首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing
【24h】

Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing

机译:迈向因果VQA:通过不变和协变语义编辑来揭示和减少虚假相关性

获取原文

摘要

Despite significant success in Visual Question Answering (VQA), VQA models have been shown to be notoriously brittle to linguistic variations in the questions. Due to deficiencies in models and datasets, today’s models often rely on correlations rather than predictions that are causal w.r.t. data. In this paper, we propose a novel way to analyze and measure the robustness of the state of the art models w.r.t semantic visual variations as well as propose ways to make models more robust against spurious correlations. Our method performs automated semantic image manipulations and tests for consistency in model predictions to quantify the model robustness as well as generate synthetic data to counter these problems. We perform our analysis on three diverse, state of the art VQA models and diverse question types with a particular focus on challenging counting questions. In addition, we show that models can be made significantly more robust against inconsistent predictions using our edited data. Finally, we show that results also translate to real-world error cases of state of the art models, which results in improved overall performance
机译:尽管在视觉问题解答(VQA)中取得了巨大的成功,但事实证明,VQA模型对于问题中的语言变化非常脆弱。由于模型和数据集的不足,当今的模型通常依赖于相关性,而不是因果关系而做出的预测。数据。在本文中,我们提出了一种新颖的方法来分析和测量具有语义视觉变化的最新模型的健壮性,并提出了使模型更可靠地抵抗虚假相关性的方法。我们的方法执行自动语义图像处理并测试模型预测的一致性,以量化模型的鲁棒性,并生成合成数据来解决这些问题。我们对三种不同的,最先进的VQA模型和各种问题类型进行分析,并特别关注具有挑战性的计数问题。此外,我们证明,使用我们编辑过的数据,可以使模型在抵御不一致的预测时变得更加强大。最后,我们表明结果还转化为最新模型的真实错误案例,从而改善了总体性能

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号