Integrating Transformer into Global and Residual Image Feature Extractor in Visual Question Answering for Blind People

机译：将变压器集成到全局和残差图像特征提取器中的视觉问题中回答盲人

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual Question Answering (VQA), the novel task among the intersection between Computer Vision (CV) and Natural Language Processing (NLP), extracts answers from features of both questions and images. The current approaches in VQA rely on the combination between convolution and recurrent networks, which leads to the huge number of parameters for learning phase. With the success of employing pre-trained models, we integrate BERT [1] for embedding text and two models: ResNets [2] and VGG [3] for embedding image. In addition, we also propose to take advantages of fine-tuning techniques and stacked attention mechanism to combine textual and visual features in a novel learning phase considered its ability to reduce the size of models. To demonstrate our model’s performance, we conduct experiments in the VizWiz VQA Challenge 2020. According to the experimental results, the proposed approach outperforms existing methods for Yes-No questions on VizWiz VQA dataset

机译：视觉问题应答（VQA），计算机视觉（CV）与自然语言处理（NLP）之间交叉口之间的新任务，从两个问题和图像的特征中提取答案。 VQA中的当前方法依赖于卷积和经常性网络之间的组合，这导致了学习阶段的大量参数。随着采用预先训练的模型的成功，我们集成了BERT [1]来嵌入文本和两个模型：RESNET [2]和VGG [3]进行嵌入图像。此外，我们还建议采取微调技术和堆叠注意机制，以将文本和视觉特征结合在新颖的学习阶段，认为其能够降低模型大小的能力。为了展示我们的模型的性能，我们在Vizwiz VQA挑战中进行实验2020.根据实验结果，所提出的方法优于Vizwiz VQA数据集的现有方法

著录项

来源
《International Conference on Knowledge and Systems Engineering》|2020年|31-36|共6页
会议地点
作者
Tung Le; Nguyen Tien Huy; Nguyen Le Minh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Feature extraction; Knowledge discovery; Natural language processing; Modeling; Task analysis; Image classification;

机译：可视化;特征提取;知识发现;自然语言处理;建模;任务分析;图像分类;

相似文献

外文文献
中文文献
专利

1. A novel feature extractor for human action recognition in visual question answering [J] . Silva Francisco H. dos S., Bezerra Gabriel M., Holanda Gabriel B., Pattern recognition letters . 2021,第Jula期

机译：在视觉问题应答中的人类行动识别的一个新颖特征提取器
2. BETTER GENERIC OBJECTS COUNTING WHEN ASKING QUESTIONS TO IMAGES: A MULTITASK APPROACH FOR REMOTE SENSING VISUAL QUESTION ANSWERING [J] . S. Lobry, D. Marcos, B. Kellenberger, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences . 2020,第5期

机译：在向图像提出问题时计算更好的通用对象：遥感视觉问题的多任务方法
3. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
4. VizWiz Grand Challenge: Answering Visual Questions from Blind People [C] . Danna Gurari, Qing Li, Abigale J. Stangl, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：VizWiz大挑战：回答盲人的视觉问题
5. Social Microvolunteering: Quick, Free Answers to Visual Questions from Blind People [D] . Brady, Erin 2015

机译：社会微志愿者：盲人视觉问题的快速，免费答案
6. A dataset of clinically generated visual questions and answers about radiology images [O] . Jason J. Lau, Soumya Gayen, Asma Ben Abacha, 2018

机译：临床产生的有关放射影像的视觉问题和答案的数据集
7. Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering [O] . Soravit Changpinyo, Bo Pang, Piyush Sharma, 2019

机译：用超细粒度语义标签解耦箱提案和功能化改善了图像标题和视觉问题的回答

Integrating Transformer into Global and Residual Image Feature Extractor in Visual Question Answering for Blind People

摘要

著录项

相似文献

相关主题

期刊订阅