首页> 外文会议>Conference on empirical methods in natural language processing >Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
【24h】

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

机译:视觉问题解答中的人类注意力:人类和深层网络是否看待同一地区?

获取原文

摘要

We conduct large-scale studies on 'human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate attention maps generated by state-of-the-art VQA models against human attention both qualitatively (via visualizations) and quantitatively (via rank-order correlation). Overall, our experiments show that current VQA attention models do not seem to be looking at the same regions as humans.
机译:我们在视觉问题解答(VQA)中对“人类注意力”进行了大规模研究,以了解人类选择寻找图像问题的答案。我们设计并测试了多个受游戏启发的新颖的注意-注释界面,这些界面要求对象锐化模糊图像的区域以回答问题。因此,我们介绍了VQA-HAT(人类注意力)数据集。我们通过定性(通过可视化)和定量(通过等级相关)评估由最新VQA模型生成的针对人类注意力的注意力图。总体而言,我们的实验表明,当前的VQA注意力模型似乎并没有关注与人类相同的区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号