首页> 外文会议>International Conference on Computer Vision >Dynamic Graph Attention for Referring Expression Comprehension
【24h】

Dynamic Graph Attention for Referring Expression Comprehension

机译:引用表达理解的动态图注意

获取原文

摘要

Referring expression comprehension aims to locate the object instance described by a natural language referring expression in an image. This task is compositional and inherently requires visual reasoning on top of the relationships among the objects in the image. Meanwhile, the visual reasoning process is guided by the linguistic structure of the referring expression. However, existing approaches treat the objects in isolation or only explore the first-order relationships between objects without being aligned with the potential complexity of the expression. Thus it is hard for them to adapt to the grounding of complex referring expressions. In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step reasoning by modeling both the relationships among the objects in the image and the linguistic structure of the expression. In particular, we construct a graph for the image with the nodes and edges corresponding to the objects and their relationships respectively, propose a differential analyzer to predict a language-guided visual reasoning process, and perform stepwise reasoning on top of the graph to update the compound object representation at every node. Experimental results demonstrate that the proposed method can not only significantly surpass all existing state-of-the-art algorithms across three common benchmark datasets, but also generate interpretable visual evidences for stepwise locating the objects referred to in complex language descriptions.
机译:参照表达理解旨在在图像中定位由自然语言参照表达所描述的对象实例。此任务是合成的,本质上需要在图像中对象之间的关系之上进行视觉推理。同时,视觉推理过程由指称表达的语言结构指导。但是,现有方法孤立地对待对象,或者仅探索对象之间的一阶关系,而与表达式的潜在复杂性保持一致。因此,他们很难适应复杂的引用表达的基础。在本文中,我们从语言驱动的视觉推理角度探讨了表达理解的参考问题,并提出了一种动态图注意力网络,该模型通过对图像中对象之间的关系和语言结构进行建模,从而执行多步骤推理。的表达。特别是,我们为图像构造了一个图形,其节点和边缘分别对应于对象及其关系,提出了一种差分分析器来预测语言指导的视觉推理过程,并在图的顶部执行逐步推理以更新图像。每个节点上的复合对象表示。实验结果表明,该方法不仅可以大大超越三个通用基准数据集上所有现有的现有技术,而且还可以生成可解释的视觉证据,以逐步定位复杂语言描述中提到的对象。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号