首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space
【24h】

Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space

机译:学习识别视觉概念的视觉概念与结构标签空间应答

获取原文
获取原文并翻译 | 示例
       

摘要

Solving visual question answering (VQA) task requires recognizing many diverse visual concepts as the answer. These visual concepts contain rich structural semantic meanings, e.g., some concepts in VQA are highly related (e.g., red & blue), some of them are less relevant (e.g., red & standing). It is very natural for humans to efficiently learn concepts by utilizing their semantic meanings to concentrate on distinguishing relevant concepts and eliminate the disturbance of irrelevant concepts. However, previous works usually use a simple MLP to output visual concept as the answer in a flat label space that treats all labels equally, causing limitations in representing and using the semantic meanings of labels. To address this issue, we propose a novel visual recognition module named Dynamic Concept Recognizer (DCR), which is easy to be plugged in an attention-based VQA model, to utilize the semantics of the labels in answer prediction. Concretely, we introduce two key features in DCR: 1) a novel structural label space to depict the difference of semantics between concepts, where the labels in new label space are assigned to different groups according to their meanings. This type of semantic information helps decompose the visual recognizer in VQA into multiple specialized sub-recognizers to improve the capacity and efficiency of the recognizer. 2) A feature attention mechanism to capture the similarity between relevant groups of concepts, e.g., human-related group "chef, waiter" is more related to "swimming, running, etc." than scene related group "sunny, rainy, etc.". This type of semantic information helps sub-recognizers for relevant groups to adaptively share part of modules and to share the knowledge between relevant sub-recognizers to facilitate the learning procedure. Extensive experiments on several datasets have shown that the proposed structural label space and DCR module can efficiently learn the visual concept recognition and benefit the performance of the VQA model.
机译:解决视觉问题应答(VQA)任务需要识别许多不同的视觉概念作为答案。这些视觉概念含有丰富的结构语义含义,例如,VQA中的一些概念是高度相关的(例如,红色和蓝色),其中一些是较不相关的(例如,红色和站立)。人类非常自然地通过利用他们的语义意义有效地学习概念,以专注于区分相关概念并消除无关概念的扰动。然而,以前的作品通常使用简单的MLP将视觉概念输出为平面标签空间中的答案,这些空间在平等地处理所有标签,导致表示和使用标签的语义含义的限制。为了解决这个问题,我们提出了一个名为动态概念识别器(DCR)的新型视觉识别模块,该模块易于插入基于关注的VQA模型,以利用标签中的标签中的语义。具体地,我们在DCR中引入了两个关键功能:1)一种新颖的结构标签空间,以描绘概念之间的语义差异,其中新标签空间中的标签根据其含义分配给不同的组。这种类型的语义信息有助于将VQA中的视觉识别器分解为多个专业的子识别器,以提高识别器的容量和效率。 2)一种注意力机制,以捕捉相关概念群体之间的相似性,例如人类相关组“厨师,服务员”与“游泳,跑步等”有关。比现场相关的团体“阳光,多雨等”。这种类型的语义信息有助于子识别器用于相关组,以自适应地共享模块的一部分,并在相关子识别器之间共享知识,以促进学习程序。在多个数据集上的广泛实验表明,所提出的结构标签空间和DCR模块可以有效地学习视觉概念识别并效益VQA模型的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号