...
首页> 外文期刊>Information Fusion >Information fusion in visual question answering: A Survey
【24h】

Information fusion in visual question answering: A Survey

机译:视觉问题的信息融合应答:调查

获取原文
获取原文并翻译 | 示例
           

摘要

Visual question answering automatically answers natural language questions according to the content of an image or video. The task is challenging because it requires the understanding of semantic information in the textual and visual channels, as well as their interplay. A typical solver is composed of three components: feature extraction from singular modality, feature fusion between visual and textual channels, and answer prediction based on the learnt joint representation. Among them, information fusion plays a key role in enhancing the overall accuracy and various types of approaches have been proposed, such as simple vector operators, deep neural networks, bilinear pooling, attention mechanisms, and memory networks. The primary objective of this survey is to provide a clear organization and comprehensive review on the ever-proposed fusion techniques in the domain of visual question answering. We propose an abstract fusion framework that can fit the majority of existing VQA models, making it convenient for readers to quickly understand their key contributions. Finally, we summarize the effective fusion strategies that have been widely adopted so as to benefit readers in their model design.
机译:视觉问题应根据图像或视频的内容自动答案自然语言问题。任务是具有挑战性的,因为它需要了解文本和视觉渠道中的语义信息,以及它们的相互作用。典型的求解器由三个组件组成:来自奇异模态的特征提取,视觉和文本频道之间的特征融合,以及基于学习的关节表示的回答预测。其中,信息融合在提高整体准确性和各种类型的方法方面发挥着关键作用,例如简单的矢量运营商,深神经网络,双线性汇集,注意机制和内存网络。本调查的主要目标是提供明确的组织和对视觉问题领域的有史以来的融合技术的全面审查。我们提出了一种抽象的融合框架,可以符合大多数现有的VQA模型,使读者能够方便快速了解他们的主要贡献。最后,我们总结了已被广泛采用的有效融合策略,以便在其模型设计中受益读者。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号