Information fusion in visual question answering: A Survey

Zhang Dongxiang; Cao Rui; Wu Sai

首页> 外文期刊>Information Fusion >Information fusion in visual question answering: A Survey

【24h】

Information fusion in visual question answering: A Survey

机译：视觉问题的信息融合应答：调查

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual question answering automatically answers natural language questions according to the content of an image or video. The task is challenging because it requires the understanding of semantic information in the textual and visual channels, as well as their interplay. A typical solver is composed of three components: feature extraction from singular modality, feature fusion between visual and textual channels, and answer prediction based on the learnt joint representation. Among them, information fusion plays a key role in enhancing the overall accuracy and various types of approaches have been proposed, such as simple vector operators, deep neural networks, bilinear pooling, attention mechanisms, and memory networks. The primary objective of this survey is to provide a clear organization and comprehensive review on the ever-proposed fusion techniques in the domain of visual question answering. We propose an abstract fusion framework that can fit the majority of existing VQA models, making it convenient for readers to quickly understand their key contributions. Finally, we summarize the effective fusion strategies that have been widely adopted so as to benefit readers in their model design.

机译：视觉问题应根据图像或视频的内容自动答案自然语言问题。任务是具有挑战性的，因为它需要了解文本和视觉渠道中的语义信息，以及它们的相互作用。典型的求解器由三个组件组成：来自奇异模态的特征提取，视觉和文本频道之间的特征融合，以及基于学习的关节表示的回答预测。其中，信息融合在提高整体准确性和各种类型的方法方面发挥着关键作用，例如简单的矢量运营商，深神经网络，双线性汇集，注意机制和内存网络。本调查的主要目标是提供明确的组织和对视觉问题领域的有史以来的融合技术的全面审查。我们提出了一种抽象的融合框架，可以符合大多数现有的VQA模型，使读者能够方便快速了解他们的主要贡献。最后，我们总结了已被广泛采用的有效融合策略，以便在其模型设计中受益读者。

著录项

来源
《Information Fusion》 |2019年第2019期|共13页
作者
Zhang Dongxiang; Cao Rui; Wu Sai;
展开▼
作者单位

Zhejiang Univ Coll Comp Sci &

Technol Hangzhou Zhejiang Peoples R China;

Univ Elect Sci &

Technol China Sch Comp Sci &

Engn Chengdu Sichuan Peoples R China;

Zhejiang Univ Coll Comp Sci &

Technol Hangzhou Zhejiang Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Information fusion; Visual question answering; Survey;

机译：信息融合;视觉问题应答;调查;

相似文献

外文文献
中文文献
专利

1. Information fusion in visual question answering: A Survey [J] . Zhang Dongxiang, Cao Rui, Wu Sai Information Fusion . 2019,第期

机译：视觉问题的信息融合应答：调查
2. Multimodal feature fusion by relational reasoning and attention for visual question answering [J] . Zhang Weifeng, Yu Jing, Hu Hua, Information Fusion . 2020,第期

机译：通过关系推理和关注的多模式特征融合
3. Visual question answering: A survey of methods and datasets [J] . Qi Wu, Damien Teney, Peng Wang, Computer vision and image understanding . 2017,第octa期

机译：视觉问题解答：方法和数据集调查
4. BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection [C] . Hedi Ben-younes, Remi Cadene, Nicolas Thome, AAAI Conference on Artificial Intelligence . 2019

机译：块：双线性超透视融合，用于视觉问题应答和视觉关系检测
5. Attention Correction Mechanisms in Visual Contexts in Visual Question Answering [D] . Sharan, Komal 2018

机译：视觉问答中视觉上下文中的注意力纠正机制
6. A Depth Evidence Score Fusion Algorithm for Chinese Medical Intelligence Question Answering System [O] . Xiabing Zhou, Binglin Wu, Qinglei Zhou 2018

机译：中国医学智能问答系统的深度证据分数融合算法
7. Improved Fusion of Visual and Language Representations by Dense Symmetric Co-attention for Visual Question Answering [O] . Duy-Kien Nguyen, Takayuki Okatani 2018

机译：通过密集的对称关注改进了视觉和语言表示的融合，以了解视觉问题
8. Answering Questions, Questioning Answers: Evaluating Data Quality in an Establishment Survey [R] . Goldenberg, K. L. 2008

机译：回答问题，质疑答案：评估企业调查中的数据质量

Information fusion in visual question answering: A Survey

摘要

著录项

相似文献

相关主题

期刊订阅