DynGraph: Visual Question Answering via Dynamic Scene Graphs

机译：Dyngraph：通过动态场景图答应答的视觉问题

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to the rise of deep learning, reasoning across various domains, such as vision, language, robotics, and control, has seen major progress in recent years. A popular benchmark for evaluating models for visual reasoning is Visual Question Answering (VQA), which aims at answering questions about a given input image by joining the two modalities: (1) the text representing the question, as well as, (2) the visual information extracted from the input image. In this work, we propose a structured approach for VQA that is based on dynamic graphs learned automatically from the input. Unlike the common approach for VQA that relies on an attention mechanism applied on a cell-structured global embedding of the image, our model leverages the rich structure in the image depicted in the object instances and their interaction. In our model, nodes in the graph correspond to object instances present in the image while the edges represent relations among them. Our model automatically constructs the scene graph and attends to the relations among the nodes to answer the given question. Hence, our model can be trained end-to-end and it does not require additional training labels in the form of predefined graphs or relations. We demonstrate the effectiveness of our approach on the challenging open-ended Visual Genome benchmark for VQA.

机译：由于深入学习的兴起，近年来，各个领域的推理，如愿景，语言，机器人和控制，近年来都有重大进展。一种流行的基准，用于评估视觉推理模型是视觉问题的应答（VQA），其目的在于通过加入两个模态来应答关于给定输入图像的问题：（1）代表问题的文本，以及（2）从输入图像中提取的可视信息。在这项工作中，我们提出了一种基于从输入自动学习的动态图形的VQA的结构化方法。与VQA的共同方法不同，依赖于应用于图像的细胞结构全局嵌入的注意机制，我们的模型利用了对象实例中描绘的图像中的丰富结构及其交互。在我们的模型中，图中的节点对应于图像中存在的对象实例，而边缘代表它们之间的关系。我们的模型自动构建场景图，并参加节点之间的关系以回答给定的问题。因此，我们的模型可以训练结束于结束，它不需要以预定义的图形或关系的形式额外的训练标签。我们展示了我们对VQA挑战开放式视觉基因组基准的效果。

著录项

来源
《German Conference on Pattern Recognition》|2019年|717p|共14页
会议地点
作者
Monica Haurilet; Ziad Al-Halah; Rainer Stiefelhagen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.4-53;
关键词
入库时间 2022-08-20 20:20:28

相似文献

外文文献
中文文献
专利

1. dyngraph2vec: Capturing network dynamics using dynamic graph representation learning [J] . Goyal Palash, Chhetri Sujit Rokka, Canedo Arquimedes Knowledge-Based Systems . 2020,第Jana期

机译：dyngraph2vec：使用动态图表示学习来捕获网络动态
2. Multiple answers to a question: a new approach for visual question answering [J] . Hosseinabad Sayedshayan Hashemi, Safayani Mehran, Mirzaei Abdolreza The Visual Computer . 2021,第1期

机译：问题的多个答案：一种新的视觉问题接听方法
3. Question-aware prediction with candidate answer recommendation for visual question answering [J] . B. Kim, J. Kim Electronics Letters . 2017,第18期

机译：带有候选答案推荐的问题感知预测，用于视觉问答
4. DynGraph: Visual Question Answering via Dynamic Scene Graphs [C] . Monica Haurilet, Ziad Al-Halah, Rainer Stiefelhagen German conference on pattern recognition . 2019

机译：DynGraph：通过动态场景图进行视觉问答
5. Attention Correction Mechanisms in Visual Contexts in Visual Question Answering [D] . Sharan, Komal 2018

机译：视觉问答中视觉上下文中的注意力纠正机制
6. Dynamics of Regional Lung Inflammation: New Questions and Answers Using PET [O] . J. Batista Borges, G. Hedenstierna, F. Suarez-Sipmann -1

机译：区域性肺炎症的动力学：PET的新问答
7. Visual question answering based on local-scene-aware referring expression generation [O] . Jung-Jun Kim, Dong-Gyu Lee, Jialin Wu, 2021

机译：基于本地场景感知引用表达式的视觉问题应答

DynGraph: Visual Question Answering via Dynamic Scene Graphs

摘要

著录项

相似文献

相关主题

期刊订阅