首页> 外文会议>2018 international conference on image and video Processing, and artificial intelligence >Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning
【24h】

Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning

机译:基于多种视频字幕功能的Bi-LSTM网络中的时空注意

获取原文
获取原文并翻译 | 示例

摘要

Automatically generating rich natural language descriptions for open-domain videos is among the most challenging tasks of computer vision, natural language processing and machine learning. Based on the general approach of encoder-decoder frameworks, we propose a bidirectional long short-term memory network with spatial-temporal attention based on multiple features of objects, activities and scenes, which can learn valuable and complementary high-level visual representations, and dynamically focus on the most important context information of diverse frames within different subsets of videos. From the experimental results, our proposed methods achieve competitive or better than state-of-the-art performance on the MSVD video dataset.
机译:为开放域视频自动生成丰富的自然语言描述是计算机视觉,自然语言处理和机器学习中最具挑战性的任务之一。基于编码器-解码器框架的一般方法,我们基于对象,活动和场景的多个特征,提出了一种具有时空注意力的双向长期短期记忆网络,可以学习有价值的和互补的高级视觉表示,并且动态地关注视频不同子集中不同帧的最重要上下文信息。从实验结果来看,我们提出的方法在MSVD视频数据集上具有竞争性或优于最新技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号