Exploiting hierarchical visual features for visual question answering

Hong Jongkwang; Fu Jianlong; Uh Youngjung; Mei Tao; Byun Hyeran

首页> 外文期刊>Neurocomputing >Exploiting hierarchical visual features for visual question answering

【24h】

Exploiting hierarchical visual features for visual question answering

机译：利用分层视觉功能进行视觉问答

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Visual question answering (VQA) aims reasoning answers given a pair of textual question and image. Previous approaches for VQA use only the highest layer of a Convolutional Neural Network (CNN) for visual representation, which biases on object classification task. These object-categorization oriented features lose low-level semantics (attribute related questions), e.g., color, texture, and the number of instances. Consequently, conventional VQA methods are vulnerable to low-level semantic questions. On the other hand, low-level layer features retain the low-level semantics. Thus, we suggest that the low-level layer features are superior in low-level semantic questions, and justify it through our experiments. Furthermore, we propose a novel VQA model named Hierarchical Feature Network (HFnet), which exploits intermediate CNN layers to derive various semantics for VQA. In the answer reasoning stage, each hierarchical feature is combined with the attention map and multimodal pooled to consider both high and low level semantic questions. Our proposed model outperforms the existing methods. The qualitative experiments also demonstrate that our proposed HFnet is superior in reasoning attention regions. (C) 2019 Elsevier B.V. All rights reserved.

机译：视觉问题回答（VQA）的目标是给出一对文本问题和图像的推理答案。 VQA的先前方法仅使用卷积神经网络（CNN）的最高层进行视觉表示，这偏向于对象分类任务。这些面向对象分类的功能丢失了低级语义（属性相关的问题），例如颜色，纹理和实例数量。因此，常规的VQA方法容易受到底层语义问题的影响。另一方面，低层图层功能保留了低层语义。因此，我们建议在低层语义问题中低层特征是优越的，并通过我们的实验证明了这一点。此外，我们提出了一种新颖的VQA模型，称为层次特征网络（HFnet），该模型利用中间CNN层来导出VQA的各种语义。在答案推理阶段，将每个层次结构特征与注意力图和多模式合并在一起，以考虑高级语义和低级语义问题。我们提出的模型优于现有方法。定性实验还表明，我们提出的HFnet在推理关注区域方面具有优势。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Neurocomputing》 |2019年第25期|187-195|共9页
作者
Hong Jongkwang; Fu Jianlong; Uh Youngjung; Mei Tao; Byun Hyeran;
展开▼
作者单位

Yonsei Univ, Dept Comp Sci, Seoul, South Korea;

Microsoft Res Asia, Beijing, Peoples R China;

Naver Clova AI, Seoul, South Korea;

JD AI, Beijing, Peoples R China;

Yonsei Univ, Dept Comp Sci, Seoul, South Korea;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Visual question answering; Multi-level features; Neural networks;

机译：视觉问答;多层次特征;神经网络;

相似文献

外文文献
中文文献
专利

1. Exploiting hierarchical visual features for visual question answering [J] . Hong Jongkwang, Fu Jianlong, Uh Youngjung, Neurocomputing . 2019,第Jul25期

机译：用于视觉问题应答的分层视觉功能
2. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
3. Hierarchical deep multi-modal network for medical visual question answering [J] . Deepak Gupta, Swati Suman, Asif Ekbal Expert systems with applications . 2021,第Feba期

机译：用于医学视觉问题的分层深层多模态网络
4. Hierarchical Question-Image Co-Attention for Visual Question Answering [C] . Jiasen Lu, Jianwei Yang, Dhruv Batra, Annual conference on Neural Information Processing Systems . 2016

机译：视觉问题回答的分层问题-图像共同注意
5. Attention Correction Mechanisms in Visual Contexts in Visual Question Answering [D] . Sharan, Komal 2018

机译：视觉问答中视觉上下文中的注意力纠正机制
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Hierarchical deep multi-modal network for medical visual question answering [O] . Deepak Gupta, Swati Suman, Asif Ekbal 2021

机译：用于医学视觉问题的分层深层多模态网络

Exploiting hierarchical visual features for visual question answering

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅