首页> 外文OA文献 >Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

【2h】

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

机译：图像标题和视觉问题的自下而上的关注

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Top-down visual attention mechanisms have been used extensively in imagecaptioning and visual question answering (VQA) to enable deeper imageunderstanding through fine-grained analysis and even multiple steps ofreasoning. In this work, we propose a combined bottom-up and top-down attentionmechanism that enables attention to be calculated at the level of objects andother salient image regions. This is the natural basis for attention to beconsidered. Within our approach, the bottom-up mechanism (based on FasterR-CNN) proposes image regions, each with an associated feature vector, whilethe top-down mechanism determines feature weightings. Applying this approach toimage captioning, our results on the MSCOCO test server establish a newstate-of-the-art for the task, improving the best published result in terms ofCIDEr score from 114.7 to 117.9 and BLEU-4 from 35.2 to 36.9. Demonstrating thebroad applicability of the method, applying the same approach to VQA we obtainfirst place in the 2017 VQA Challenge.

机译：自上而下的视觉注意力机制已广泛应用于ImageCaptioning和VQA），以通过细粒度分析甚至多步骤甚至多步骤来实现更深入的InformunerStand。在这项工作中，我们提出了一个组合的自下而上的自上而下的注意力机制，使得能够在物体和其他突出图像区域的水平下计算。这是关注被剥夺的自然基础。在我们的方法中，自下而上机制（基于FasterR-CNN）提出了每个具有相关特征向量的图像区域，而自上而下机制确定特征权重。应用此方法ToImage标题，我们的Mscoco测试服务器的结果为任务建立了一个新的艺术品，从34.7至117.9和35.2到36.9的人员，从34.7到117.9和Bleu-4的比分中提高了最佳发布结果。展示该方法的Bread适用性，在2017年VQA挑战中应用了与VQA的VQA相同的方法。

著录项

作者
Peter Anderson; Xiaodong He; Chris Buehler; Damien Teney; Mark Johnson; Stephen Gould; Lei Zhang;
展开▼
作者单位

展开▼
年度 2018
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
2. Question-Led object attention for visual question answering [J] . Gao Lianli, Cao Liangfu, Xu Xing, Neurocomputing . 2020,第May28期

机译：问题LED对象注意视觉问题应答
3. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering [J] . Pan Lu, Lei Ji, Wei Zhang, SIGKDD explorations . 2018,第Udisk期

机译：R-VQA：学习具有语义关注的视觉关系事实，用于视觉问题应答
4. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [C] . Peter Anderson, Xiaodong He, Chris Buehler, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：自下而上和自上而下的注意力，用于图像字幕和视觉问题解答
5. An Analysis of Bottom-Up Attention Models and Multimodal Representation Learning for Visual Question Answering [D] . Narayanan, Venkatraman . 2019

机译：视觉问题应答的自下而上关注模型和多式联表学习分析
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering [O] . Soravit Changpinyo, Bo Pang, Piyush Sharma, 2019

机译：用超细粒度语义标签解耦箱提案和功能化改善了图像标题和视觉问题的回答

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅