Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space

Gao Difei; Wang Ruiping; Shan Shiguang; Chen Xilin

首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space

【24h】

Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space

机译：学习识别视觉概念的视觉概念与结构标签空间应答

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Solving visual question answering (VQA) task requires recognizing many diverse visual concepts as the answer. These visual concepts contain rich structural semantic meanings, e.g., some concepts in VQA are highly related (e.g., red & blue), some of them are less relevant (e.g., red & standing). It is very natural for humans to efficiently learn concepts by utilizing their semantic meanings to concentrate on distinguishing relevant concepts and eliminate the disturbance of irrelevant concepts. However, previous works usually use a simple MLP to output visual concept as the answer in a flat label space that treats all labels equally, causing limitations in representing and using the semantic meanings of labels. To address this issue, we propose a novel visual recognition module named Dynamic Concept Recognizer (DCR), which is easy to be plugged in an attention-based VQA model, to utilize the semantics of the labels in answer prediction. Concretely, we introduce two key features in DCR: 1) a novel structural label space to depict the difference of semantics between concepts, where the labels in new label space are assigned to different groups according to their meanings. This type of semantic information helps decompose the visual recognizer in VQA into multiple specialized sub-recognizers to improve the capacity and efficiency of the recognizer. 2) A feature attention mechanism to capture the similarity between relevant groups of concepts, e.g., human-related group "chef, waiter" is more related to "swimming, running, etc." than scene related group "sunny, rainy, etc.". This type of semantic information helps sub-recognizers for relevant groups to adaptively share part of modules and to share the knowledge between relevant sub-recognizers to facilitate the learning procedure. Extensive experiments on several datasets have shown that the proposed structural label space and DCR module can efficiently learn the visual concept recognition and benefit the performance of the VQA model.

机译：解决视觉问题应答（VQA）任务需要识别许多不同的视觉概念作为答案。这些视觉概念含有丰富的结构语义含义，例如，VQA中的一些概念是高度相关的（例如，红色和蓝色），其中一些是较不相关的（例如，红色和站立）。人类非常自然地通过利用他们的语义意义有效地学习概念，以专注于区分相关概念并消除无关概念的扰动。然而，以前的作品通常使用简单的MLP将视觉概念输出为平面标签空间中的答案，这些空间在平等地处理所有标签，导致表示和使用标签的语义含义的限制。为了解决这个问题，我们提出了一个名为动态概念识别器（DCR）的新型视觉识别模块，该模块易于插入基于关注的VQA模型，以利用标签中的标签中的语义。具体地，我们在DCR中引入了两个关键功能：1）一种新颖的结构标签空间，以描绘概念之间的语义差异，其中新标签空间中的标签根据其含义分配给不同的组。这种类型的语义信息有助于将VQA中的视觉识别器分解为多个专业的子识别器，以提高识别器的容量和效率。 2）一种注意力机制，以捕捉相关概念群体之间的相似性，例如人类相关组“厨师，服务员”与“游泳，跑步等”有关。比现场相关的团体“阳光，多雨等”。这种类型的语义信息有助于子识别器用于相关组，以自适应地共享模块的一部分，并在相关子识别器之间共享知识，以促进学习程序。在多个数据集上的广泛实验表明，所提出的结构标签空间和DCR模块可以有效地学习视觉概念识别并效益VQA模型的性能。

著录项

来源
《Selected Topics in Signal Processing, IEEE Journal of》 |2020年第3期|494-505|共12页
作者
Gao Difei; Wang Ruiping; Shan Shiguang; Chen Xilin;
展开▼
作者单位

Chinese Acad Sci Chinese Acad Sci Inst Comp Technol Key Lab Intelligent Informat Proc Beijing 100190 Peoples R China|Univ Chinese Acad Sci Beijing 100049 Peoples R China;

Chinese Acad Sci Chinese Acad Sci Inst Comp Technol Key Lab Intelligent Informat Proc Beijing 100190 Peoples R China|Univ Chinese Acad Sci Beijing 100049 Peoples R China;

Chinese Acad Sci Chinese Acad Sci Inst Comp Technol Key Lab Intelligent Informat Proc Beijing 100190 Peoples R China|Univ Chinese Acad Sci Beijing 100049 Peoples R China;

Chinese Acad Sci Chinese Acad Sci Inst Comp Technol Key Lab Intelligent Informat Proc Beijing 100190 Peoples R China|Univ Chinese Acad Sci Beijing 100049 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Visualization; Semantics; Grounding; Knowledge discovery; Sports; Task analysis; Image recognition; Visual question answering; visual concept recognition; structural label space;

机译：可视化;语义;接地;知识发现;运动;任务分析;图像识别;视觉问题应答;视觉概念识别;结构标签空间;

相似文献

外文文献
中文文献
专利

1. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering [J] . Pan Lu, Lei Ji, Wei Zhang, SIGKDD explorations . 2018,第Udisk期

机译：R-VQA：学习具有语义关注的视觉关系事实，用于视觉问题应答
2. Selective residual learning for Visual Question Answering [J] . Hong Jongkwang, Park Sungho, Byun Hyeran Neurocomputing . 2020,第Auga18期

机译：用于视觉问题的选择性剩余学习
3. Explicit ensemble attention learning for improving visual question answering [J] . Lioutas Vasileios, Passalis Nikolaos, Tefas Anastasios Pattern recognition letters . 2018,第AUGa1期

机译：显式整体注意力学习可改善视觉问题的回答
4. Multiple Interaction Learning with Question-Type Prior Knowledge for Constraining Answer Search Space in Visual Question Answering [C] . Tuong Do, Binh X. Nguyen, Huy Tran, European conference on computer vision . 2020

机译：用问题类型的多次交互学习在视觉问题应答中约束答案搜索空间的先前知识
5. An Analysis of Bottom-Up Attention Models and Multimodal Representation Learning for Visual Question Answering [D] . Narayanan, Venkatraman . 2019

机译：视觉问题应答的自下而上关注模型和多式联表学习分析
6. Biomedical image representation approach using visualness and spatial information in a concept feature space for interactive region-of-interest-based retrieval [O] . Md. Mahmudur Rahman, Sameer K. Antani, Dina Demner-Fushman, 2015

机译：在概念特征空间中使用视觉和空间信息的生物医学图像表示方法用于基于兴趣区域的交互式检索
7. Multiple Interaction Learning with Question-Type Prior Knowledge for Constraining Answer Search Space in Visual Question Answering [O] . Tuong Do, Binh X. Nguyen, Huy Tran, 2020

机译：具有问题类型的多个交互学习，用于在视觉问题应答中约束答案搜索空间的相关知识
8. Learning To Recognize Visual Concepts: Development and Implementation of a Method for Texture Concept Acquisition Through Inductive Learning [R] . Bala, J. W. 1993

机译：学会识别视觉概念：通过归纳学习获取纹理概念的方法的开发和实现

Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅