首页> 外文会议>International Conference on Pattern Recognition >Improving Visual Question Answering using Active Perception on Static Images
【24h】

Improving Visual Question Answering using Active Perception on Static Images

机译:在静态图像上改善视觉问题的应答

获取原文

摘要

Visual Question Answering (VQA) is one of the most challenging emerging applications of deep learning. Providing powerful attention mechanisms is crucial for VQA, since the model must correctly identify the region of an image that is relevant to the question at hand. However, existing models analyze the input images at a fixed and typically small resolution, often leading to discarding valuable fine-grained details. To overcome this limitation, in this work we propose a reinforcement learning-based active perception approach that works by applying a series of transformation operations on the images (translation, zoom) in order to facilitate answering the question at hand. This allows for performing fine-grained analysis, effectively increasing the resolution at which the models process information. The proposed method is orthogonal to existing attention mechanisms and it can be combined with most existing VQA methods. The effectiveness of the proposed method is experimentally demonstrated on a challenging VQA dataset.
机译:视觉问题应答(VQA)是深度学习最具挑战性的新兴应用之一。提供强大的注意机制对于VQA至关重要,因为模型必须正确识别与手头问题相关的图像区域。然而,现有模型以固定和通常小的分辨率分析输入图像,通常导致丢弃有价值的细粒细节。为了克服这一限制,在这项工作中,我们提出了一种基于加强的学习的主动感知方法,其通过在图像上应用一系列转换操作(转换,变焦)来促进在手头的回答。这允许执行细粒度分析,有效地增加模型处理信息的分辨率。该方法与现有的注意机制正交,它可以与大多数现有的VQA方法组合。提出的方法的有效性在实验上证明了一个具有挑战性的VQA数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号