Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

机译：视觉问题解答中的人类注意力：人类和深层网络是否看待同一地区？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We conduct large-scale studies on 'human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate attention maps generated by state-of-the-art VQA models against human attention both qualitatively (via visualizations) and quantitatively (via rank-order correlation). Overall, our experiments show that current VQA attention models do not seem to be looking at the same regions as humans.

机译：我们在视觉问题解答（VQA）中对“人类注意力”进行了大规模研究，以了解人类选择寻找图像问题的答案。我们设计并测试了多个受游戏启发的新颖的注意-注释界面，这些界面要求对象锐化模糊图像的区域以回答问题。因此，我们介绍了VQA-HAT（人类注意力）数据集。我们通过定性（通过可视化）和定量（通过等级相关）评估由最新VQA模型生成的针对人类注意力的注意力图。总体而言，我们的实验表明，当前的VQA注意力模型似乎并没有关注与人类相同的区域。

著录项

来源
《Conference on empirical methods in natural language processing》|2016年|932-937|共6页
会议地点
作者
Abhishek Das; Harsh Agrawal; C. Lawrence Zitnick; Devi Parikh; Dhruv Batra;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:48:19

相似文献

外文文献
中文文献
专利

1. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? [J] . Abhishek Das, Harsh Agrawal, Larry Zitnick, Computer vision and image understanding . 2017,第octa期

机译：视觉问题解答中的人类注意力：人类和深层网络是否看待同一地区？
2. Word-to-region attention network for visual question answering [J] . Peng Liang, Yang Yang, Bin Yi, Multimedia Tools and Applications . 2019,第3期

机译：单词到区域的注意力网络，用于视觉提问
3. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
4. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? [C] . Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Conference on empirical methods in natural language processing . 2016

机译：人类注意视觉问题的回答：人类和深网络是否看同一地区？
5. Leveraging Human Reasoning to Understand and Improve Visual Question Answering [D] . Ayyubi, Hammad Abdullah. 2020

机译：利用人类推理来理解和改进视觉问题的回答
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? [O] . Das, Abhishek, Agrawal, Harsh, Zitnick, C. Lawrence, 2016

机译：视觉问题答疑中的人文关注：做人与人网络看同一个地区？

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

摘要

著录项

相似文献

相关主题

期刊订阅