首页> 外文会议>International Workshop on Pattern Recognition >Improving Visual Question Answering with Pre-trained Language Modeling
【24h】

Improving Visual Question Answering with Pre-trained Language Modeling

机译:通过预训练的语言建模改善视觉问题解答

获取原文

摘要

Visual question answering is a task of significant importance for research in artificial intelligence. However, most studies often use simple gated recurrent units (GRU) to extract question or image high-level features, and it is not enough for achieving a better performance. In this paper, two improvements are proposed to a general VQA model based on the dynamic memory network (DMN). We initialize the question module of our model using the pre-trained language model. On the other hand, we utilize a new module to replace GRU in the input fusion layer of the input module. Experimental results demonstrate the effectiveness of our method with the improvement of 1.52% on the Visual Question Answering V2 dataset over baseline.
机译:视觉问题解答是人工智能研究中非常重要的任务。但是,大多数研究通常使用简单的门控循环单元(GRU)来提取问题或图像的高级特征,这不足以实现更好的性能。本文针对基于动态内存网络(DMN)的通用VQA​​模型提出了两项​​改进。我们使用预先训练的语言模型初始化模型的问题模块。另一方面,我们利用新模块来替换输入模块的输入融合层中的GRU。实验结果证明了我们的方法的有效性,在Visual Question Answering V2数据集上比基线提高了1.52%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号