...
首页> 外文期刊>IEEE Transactions on Image Processing >Compositional Attention Networks With Two-Stream Fusion for Video Question Answering
【24h】

Compositional Attention Networks With Two-Stream Fusion for Video Question Answering

机译:具有双流融合的组成注意网络视频问题应答

获取原文
获取原文并翻译 | 示例
           

摘要

Given a video, Video Question Answering (VideoQA) aims at answering arbitrary free-form questions about the video content in natural language. A successful VideoQA framework usually has the following two key components: 1) a discriminative video encoder that learns the effective video representation to maintain as much information as possible about the video and 2) a question-guided decoder that learns to select the most related features to perform spatiotemporal reasoning, as well as outputs the correct answer. We propose compositional attention networks (CAN) with two-stream fusion for VideoQA tasks. For the encoder, we sample video snippets using a two-stream mechanism (i.e., a uniform sampling stream and an action pooling stream) and extract a sequence of visual features for each stream to represent the video semantics with implementation. For the decoder, we propose a compositional attention module to integrate the two-stream features with the attention mechanism. The compositional attention module is the core of CAN and can be seen as a modular combination of a unified attention block. With different fusion strategies, we devise five compositional attention module variants. We evaluate our approach on one long-term VideoQA dataset, ActivityNet-QA, and two short-term VideoQA datasets, MSRVTT-QA and MSVD-QA. Our CAN model achieves new state-of-the-art results on all the datasets.
机译:鉴于视频,视频问题应答(VideoQA)旨在回答关于自然语言中的视频内容的任意自由形式问题。成功的视频仪框架通常具有以下两个关键组件:1)一个判别视频编码器,用于了解有效的视频表示,以维持诸如视频和2)一个有问题的解码器,其学习选择最多的功能执行时空推理,以及输出正确答案。我们提出了具有两流融合的组成关注网络(CAN),用于视频仪任务。对于编码器,我们使用双流机制(即,均匀采样流和动作池流)来提取视频片段的视频片段,并为每个流提取一系列视觉特征,以表示具有实现的视频语义。对于解码器,我们提出了一种组成注意模块,以将双流特征与注意机制集成。组成注意模块是罐的核心,可以被视为统一注意块的模块化组合。通过不同的融合策略,我们设计了五种组成注意力模块变体。我们在一个长期VideoQA数据集,ActivityNET-QA和两个短期VideoQA数据集,MSRVTT-QA和MSVD-QA上评估我们的方法。我们的CAN模型在所有数据集上实现了新的最先进结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号