首页> 外文会议>IEEE Conference on Computer Vision and Pattern Recognition >Stacked Attention Networks for Image Question Answering
【24h】

Stacked Attention Networks for Image Question Answering

机译:堆叠式注意力网络用于图像问答

获取原文

摘要

This paper presents stacked attention networks (SANs) that learn to answer natural language questions from images. SANs use semantic representation of a question as query to search for the regions in an image that are related to the answer. We argue that image question answering (QA) often requires multiple steps of reasoning. Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively. Experiments conducted on four image QA data sets demonstrate that the proposed SANs significantly outperform previous state-of-the-art approaches. The visualization of the attention layers illustrates the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer.
机译:本文介绍了堆叠式注意力网络(SAN),该网络学习从图像中回答自然语言问题。 SAN使用问题的语义表示作为查询来搜索图像中与答案相关的区域。我们认为图像问题解答(QA)通常需要多个推理步骤。因此,我们开发了多层SAN,其中我们多次查询图像以逐步推断答案。在四个图像QA数据集上进行的实验表明,所提出的SAN明显优于以前的最新方法。注意层的可视化说明了SAN定位相关视觉线索的过程,这些线索可以逐层回答问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号