Improving Visual Question Answering with Pre-trained Language Modeling

机译：通过预训练的语言建模改善视觉问题解答

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual question answering is a task of significant importance for research in artificial intelligence. However, most studies often use simple gated recurrent units (GRU) to extract question or image high-level features, and it is not enough for achieving a better performance. In this paper, two improvements are proposed to a general VQA model based on the dynamic memory network (DMN). We initialize the question module of our model using the pre-trained language model. On the other hand, we utilize a new module to replace GRU in the input fusion layer of the input module. Experimental results demonstrate the effectiveness of our method with the improvement of 1.52% on the Visual Question Answering V2 dataset over baseline.

机译：视觉问题解答是人工智能研究中非常重要的任务。但是，大多数研究通常使用简单的门控循环单元（GRU）来提取问题或图像的高级特征，这不足以实现更好的性能。本文针对基于动态内存网络（DMN）的通用VQA模型提出了两项改进。我们使用预先训练的语言模型初始化模型的问题模块。另一方面，我们利用新模块来替换输入模块的输入融合层中的GRU。实验结果证明了我们的方法的有效性，在Visual Question Answering V2数据集上比基线提高了1.52％。

著录项

来源
《International Workshop on Pattern Recognition》|2020年|115260D.1-115260D.5|共5页
会议地点
作者
Yue Wu; Huiyi Gao; Lei Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visual question answering; pre-training; language modeling;

机译：视觉问题解答;预训练语言建模;

相似文献

外文文献
中文文献
专利

1. CONTEXTUAL LANGUAGE MODELS FOR RANKING ANSWERS TO NATURAL LANGUAGE DEFINITION QUESTIONS [J] . Alejandro Figueroa, John Atkinson Computational Intelligence . 2012,第4期

机译：用于对自然语言定义问题进行排序的上下文语言模型
2. Improving visual question answering using dropout and enhanced question encoder [J] . Fang Zhiwei, Liu Jing, Li Yong, Pattern Recognition: The Journal of the Pattern Recognition Society . 2019,第期

机译：使用辍学和增强的问题编码器改进视觉问题的回答
3. RepeatPadding: Balancing words and sentence length for language comprehension in visual question answering [J] . Information Sciences: An International Journal . 2020,第期

机译：重复流动：在视觉问题应答中平衡语言理解的单词和句子长度
4. Generalizing Question Answering System with Pre-trained Language Model Fine-tuning [C] . Dan Su, Yan Xu, Genta Indra Winata, Workshop on machine reading for question answering . 2019

机译：预训练语言模型微调的通用问答系统
5. Leveraging Human Reasoning to Understand and Improve Visual Question Answering [D] . Ayyubi, Hammad Abdullah. 2020

机译：利用人类推理来理解和改进视觉问题的回答
6. Adapting and evaluating a deep learning language model for clinical why-question answering [O] . Andrew Wen, Mohamed Y Elwazir, Sungrim Moon, 2020

机译：适应和评估深度学习语言模型以用于临床为什么问题回答
7. Pre-trained Language Model for Biomedical Question Answering [O] . Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, 2020

机译：用于生物医学问题的预先接受的语言模型

Improving Visual Question Answering with Pre-trained Language Modeling

摘要

著录项

相似文献

相关主题

期刊订阅