首页> 外文期刊>Neurocomputing >A comparative study of language transformers for video question answering
【24h】

A comparative study of language transformers for video question answering

机译:用于视频问题的语言变形金刚的比较研究

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

With the goal of correctly answering questions about images or videos, visual question answering (VQA) has quickly developed in recent years. However, current VQA systems mainly focus on answering questions about a single image and face many challenges in answering video-based questions. VQA in video not only has to understand the evolution between video frames but also requires a certain understanding of corresponding subtitles. In this paper, we propose a language Transformer-based video question answering model to encode the complex semantics from video clips. Different from previous models which represent visual features by recurrent neural networks, our model encodes visual concept sequences with a pre-trained language Transformer. We investigate the performance of our model using four language Transformers over two different datasets. The results demonstrate outstanding improvements compared to previous work. (c) 2021 Elsevier B.V. All rights reserved.
机译:在正确回答有关图像或视频的问题的目标,近年来,视觉问题应答(VQA)很快开发。 然而,当前的VQA系统主要关注对单个图像的回答问题,并在回答基于视频的问题时面临许多挑战。 VQA在视频中不仅必须了解视频帧之间的演变,而且需要对相应的字幕进行一定的了解。 在本文中,我们提出了一种基于语言变换器的视频问题应答模型,用于从视频剪辑编码复杂语义。 与以前的模型不同,代表经常性神经网络的视觉功能,我们的模型用预先培训的语言变压器编码视觉概念序列。 我们使用两个不同数据集的四种语言变压器调查模型的性能。 与以前的工作相比,结果表明出色的改善。 (c)2021 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号