首页> 外文期刊>IEEE Transactions on Image Processing >Unifying the Video and Question Attentions for Open-Ended Video Question Answering
【24h】

Unifying the Video and Question Attentions for Open-Ended Video Question Answering

机译:统一开放式视频问答的视频和问题注意

获取原文
获取原文并翻译 | 示例

摘要

Video question answering is an important task toward scene understanding and visual data retrieval. However, current visual question answering works mainly focus on a single static image, which is distinct from the dynamic and sequential visual data in the real world. Their approaches cannot utilize the temporal information in videos. In this paper, we introduce the task of free-form open-ended video question answering. The open-ended answers enable wider applications compared with the common multiple-choice tasks in Visual-QA. We first propose a data set for open-ended Video-QA with the automatic question generation approaches. Then, we propose our sequential video attention and temporal question attention models. These two models apply the attention mechanism on videos and questions, while preserving the sequential and temporal structures of the guides. The two models are integrated into the model of unified attention. After the video and the question are encoded, the answers are generated wordwisely from our models by a decoder. In the end, we evaluate our models on the proposed data set. The experimental results demonstrate the effectiveness of our proposed model.
机译:视频问答是场景理解和视觉数据检索的重要任务。但是,当前的视觉问题解答工作主要集中在单个静态图像上,这与现实世界中的动态和顺序视觉数据不同。他们的方法无法利用视频中的时间信息。在本文中,我们介绍了自由形式的开放式视频问题解答的任务。与Visual-QA中常见的多项选择任务相比,开放式答案可以实现更广泛的应用。我们首先使用自动问题生成方法为开放式视频质量保证提出数据集。然后,我们提出了我们的顺序视频注意和时间问题注意模型。这两个模型将注意机制应用于视频和问题,同时保留了指南的顺序和时间结构。这两个模型被整合到统一注意的模型中。在对视频和问题进行编码后,由解码器从我们的模型中逐字生成答案。最后,我们根据建议的数据集评估模型。实验结果证明了我们提出的模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号