首页> 外文期刊>IEEE Transactions on Image Processing >Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks
【24h】

Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks

机译:通过多模态条件对冲网络应答的开放式视频问题

获取原文
获取原文并翻译 | 示例

摘要

As a challenging task in visual information retrieval, open-ended long-form video question answering automatically generates the natural language answer from the referenced video content according to the given question. However, the existing video question answering works mainly focus on the short-form video, which may be ineffectively applied for long-form video question answering directly, due to the insufficiency of modeling the semantic representation of long-form video content. In this paper, we study the problem of open-ended long-form video question answering from the viewpoint of hierarchical multi-modal conditional adversarial network learning. We propose the hierarchical attentional encoder network to learn the joint representation of long-form video content and given question with adaptive video segmentation. We then devise the reinforced decoder network to generate the natural language answer for open-ended video question answering with multi-modal conditional adversarial network learning. We construct three large-scale open-ended video question answering datasets. The extensive experiments validate the effectiveness of our method.
机译:作为视觉信息检索的具有挑战性的任务,开放式长形视频问题应答根据给定的问题自动生成来自引用的视频内容的自然语言答案。然而,现有的视频问题回答主要关注短窗体视频,这可能无效地应用于直接的长形视频问题,这是由于长形视频内容的语义表示的不足。在本文中,我们从分层多模态条件对抗网络学习的角度研究了开放式长形视频问题的问题。我们提出了分层注意编码器网络,以了解长形视频内容的联合表示和具有自适应视频分段的给定问题。然后,我们设计了加强解码器网络,以多模态条件对冲网络学习应答的开放式视频问题的自然语言应答。我们构建了三个大型开放式视频问题应答数据集。广泛的实验验证了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号