首页> 外文会议>International Conference on Image and Video Processing, and Artificial Intelligence >Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning
【24h】

Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning

机译:基于视频字幕的多个功能的Bi-LSTM网络中的空间时间关注

获取原文

摘要

Automatically generating rich natural language descriptions for open-domain videos is among the most challenging tasks of computer vision, natural language processing and machine learning. Based on the general approach of encoder-decoder frameworks, we propose a bidirectional long short-term memory network with spatial-temporal attention based on multiple features of objects, activities and scenes, which can learn valuable and complementary high-level visual representations, and dynamically focus on the most important context information of diverse frames within different subsets of videos. From the experimental results, our proposed methods achieve competitive or better than state-of-the-art performance on the MSVD video dataset.
机译:自动为开放式视频生成丰富的自然语言描述是计算机视觉,自然语言处理和机器学习中最具挑战性的任务之一。基于编码器 - 解码器框架的一般方法,我们提出了一种双向短期内存网络,基于对象,活动和场景的多个特征,可以学习有价值和互补的高级视觉表示,以及动态地关注在不同子集中不同帧的最重要的上下文信息。从实验结果中,我们所提出的方法在MSVD视频数据集上实现竞争或优于最先进的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号