首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Interpretable Visual Question Answering by Reasoning on Dependency Trees
【24h】

Interpretable Visual Question Answering by Reasoning on Dependency Trees

机译:通过对依赖树的推理来回答可解释的视觉问题

获取原文
获取原文并翻译 | 示例

摘要

Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems. Although very recent studies have attempted to use explicit compositional processes to assemble multiple subtasks embedded in questions, their models heavily rely on annotations or handcrafted rules to obtain valid reasoning processes, which leads to either heavy workloads or poor performance on compositional reasoning. In this paper, to better align image and language domains in diverse and unrestricted cases, we propose a novel neural network model that performs global reasoning on a dependency tree parsed from the question; thus, our model is called a parse-tree-guided reasoning network (PTGRN). This network consists of three collaborative modules: i) an attention module that exploits the local visual evidence of each word parsed from the question, ii) a gated residual composition module that composes the previously mined evidence, and iii) a parse-tree-guided propagation module that passes the mined evidence along the parse tree. Thus, PTGRN is capable of building an interpretable visual question answering (VQA) system that gradually derives image cues following question-driven parse-tree reasoning. Experiments on relational datasets demonstrate the superiority of PTGRN over current state-of-the-art VQA methods, and the visualization results highlight the explainable capability of our reasoning system.
机译:理解图像问题的协作推理是一种非常关键但未引人注目的题目在可解释的视觉问题应答系统中。虽然最近的研究已经尝试使用明确的组成过程来组装嵌入问题的多个子特派团,但他们的模型依赖注释或手工制定的规则,以获得有效的推理过程,这导致了重大工作量或性能差的组成推理性能差。在本文中,为了更好地对准不同且无限制的情况下的图像和语言域,我们提出了一种新的神经网络模型,在问题解析的依赖树上执行全球推理;因此,我们的模型被称为解析树引导式推理网络(PTGRN)。这个网络由三个协作模块组成:i)一种注意力模块,它利用问题,II)解析的每个单词的本地视觉证据,II)构成先前开采的证据和iii)的门控残余组合模块,以及iii)一个解析树的指导传播模块通过解析树传递开采的证据。因此,PTGR能够构建可解释的视觉问题应答(VQA)系统,其逐渐衍生出质疑驱动的解析树推理之后的图像线索。关系数据集的实验证明了PTGRN超现实最先进的VQA方法的优越性,可视化结果突出了我们推理系统的可说明性能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号