Improving Visual Question Answering using Active Perception on Static Images

机译：在静态图像上改善视觉问题的应答

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual Question Answering (VQA) is one of the most challenging emerging applications of deep learning. Providing powerful attention mechanisms is crucial for VQA, since the model must correctly identify the region of an image that is relevant to the question at hand. However, existing models analyze the input images at a fixed and typically small resolution, often leading to discarding valuable fine-grained details. To overcome this limitation, in this work we propose a reinforcement learning-based active perception approach that works by applying a series of transformation operations on the images (translation, zoom) in order to facilitate answering the question at hand. This allows for performing fine-grained analysis, effectively increasing the resolution at which the models process information. The proposed method is orthogonal to existing attention mechanisms and it can be combined with most existing VQA methods. The effectiveness of the proposed method is experimentally demonstrated on a challenging VQA dataset.

机译：视觉问题应答（VQA）是深度学习最具挑战性的新兴应用之一。提供强大的注意机制对于VQA至关重要，因为模型必须正确识别与手头问题相关的图像区域。然而，现有模型以固定和通常小的分辨率分析输入图像，通常导致丢弃有价值的细粒细节。为了克服这一限制，在这项工作中，我们提出了一种基于加强的学习的主动感知方法，其通过在图像上应用一系列转换操作（转换，变焦）来促进在手头的回答。这允许执行细粒度分析，有效地增加模型处理信息的分辨率。该方法与现有的注意机制正交，它可以与大多数现有的VQA方法组合。提出的方法的有效性在实验上证明了一个具有挑战性的VQA数据集。

著录项

来源
《International Conference on Pattern Recognition》|2021年|879-884|共6页
会议地点
作者
Theodoros Bozinis; Nikolaos Passalis; Anastasios Tefas;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep learning; Visualization; Analytical models; Image resolution; Active perception; Reinforcement learning; Knowledge discovery;

机译：深入学习;可视化;分析模型;图像分辨率;积极的感知;强化学习;知识发现;

相似文献

外文文献
中文文献
专利

1. BETTER GENERIC OBJECTS COUNTING WHEN ASKING QUESTIONS TO IMAGES: A MULTITASK APPROACH FOR REMOTE SENSING VISUAL QUESTION ANSWERING [J] . S. Lobry, D. Marcos, B. Kellenberger, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences . 2020,第5期

机译：在向图像提出问题时计算更好的通用对象：遥感视觉问题的多任务方法
2. Improving visual question answering using dropout and enhanced question encoder [J] . Fang Zhiwei, Liu Jing, Li Yong, Pattern Recognition: The Journal of the Pattern Recognition Society . 2019,第期

机译：使用辍学和增强的问题编码器改进视觉问题的回答
3. A Question-Centric Model for Visual Question Answering in Medical Imaging [J] . Vu Minh H., Lofstedt Tommy, Nyholm Tufve, IEEE Transactions on Medical Imaging . 2020,第9期

机译：医学成像中的视觉问题的质疑为中心模型
4. Leveraging Visual Question Answering to Improve Text-to-image Synthesis [C] . Stanislav Frolov, Shailza Jolly, Joern Hees, Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge . 2020

机译：利用视觉问题回答，以改善图像综合文本
5. Leveraging Human Reasoning to Understand and Improve Visual Question Answering [D] . Ayyubi, Hammad Abdullah. 2020

机译：利用人类推理来理解和改进视觉问题的回答
6. A dataset of clinically generated visual questions and answers about radiology images [O] . Jason J. Lau, Soumya Gayen, Asma Ben Abacha, 2018

机译：临床产生的有关放射影像的视觉问题和答案的数据集
7. Thermal Hydrotherapy Improves Quality of Life and Hemodynamic Function in Patients with Chronic Heart Failure / Does Acupuncture Stimulation of the Leg Trigger Brain Areas Responsible for Visual Perception? / Mindfulness-Based Stress Reduction in Fibromyalgia – More Questions than Answers / Acupuncture for Migraine Attacks – Be Brave / Suggestion and Expectation – a Dimension Worth Considering [O] . H. Walach 2004

机译：热水疗可提高慢性心力衰竭患者的生活质量和血流动力学功能/对腿部触发脑区负责视觉感知的腿部触发脑区的血液动力学功能？纤维肌痛的基于思想的压力降低 - 比偏头痛攻击的答案/针灸更有问题 - 是勇敢的/建议和期望 - 值得考虑的尺寸

Improving Visual Question Answering using Active Perception on Static Images

摘要

著录项

相似文献

相关主题

期刊订阅