首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Visual Question Reasoning on General Dependency Tree
【24h】

Visual Question Reasoning on General Dependency Tree

机译:一般依赖树上的视觉问题推理

获取原文

摘要

The collaborative reasoning for understanding each image-question pair is very critical but under-explored for an interpretable Visual Question Answering (VQA) system. Although very recent works also tried the explicit compositional processes to assemble multiple sub-tasks embedded in the questions, their models heavily rely on the annotations or hand-crafted rules to obtain valid reasoning layout, leading to either heavy labor or poor performance on composition reasoning. In this paper, to enable global context reasoning for better aligning image and language domains in diverse and unrestricted cases, we propose a novel reasoning network called Adversarial Composition Modular Network (ACMN). This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence. Given a dependency parse tree for each question, the adversarial attention module progressively discovers salient regions of one word by densely combining regions of child word nodes in an adversarial manner. Then residual composition module merges the hidden representations of an arbitrary number of children through sum pooling and residual connection. Our ACMN is thus capable of building an interpretable VQA system that gradually dives the image cues following a question-driven reasoning route and makes global reasoning by incorporating the learned knowledge of all attention modules in a principled manner. Experiments on relational datasets demonstrate the superiority of our ACMN and visualization results show the explainable capability of our reasoning system.
机译:理解每个图像问题对的协作推理非常关键,但对于可解释的视觉问题解答(VQA)系统,其探索还不够充分。尽管最近的工作还尝试了显式的合成过程来组装嵌入到问题中的多个子任务,但是他们的模型在很大程度上依赖于注释或手工制定的规则来获得有效的推理布局,从而导致繁重的工作或糟糕的合成推理性能。在本文中,为了使全局上下文推理能够在各种不受限制的情况下更好地对齐图像和语言域,我们提出了一种新颖的推理网络,称为对抗性组合模块化网络(ACMN)。该网络包括两个协作模块:i)对抗注意力模块,用于从问题中解析出的每个单词都利用本地视觉证据; ii)剩余组成模块,用于组成先前挖掘的证据。给定每个问题的依存关系分析树,对抗注意模块通过以对抗的方式密集组合子单词节点的区域,逐步发现一个单词的显着区域。然后残差合成模块通过求和池和残差连接合并任意数量子项的隐藏表示。因此,我们的ACMN能够构建一个可解释的VQA系统,该系统会按照问题驱动的推理路线逐步吸收图像提示,并通过以原则性方式整合所有注意模块的学习知识来进行全局推理。关系数据集上的实验证明了我们ACMN的优越性,可视化结果表明了我们推理系统的可解释性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号