...
首页> 外文期刊>IEEE transactions on multimedia >STAT: Spatial-Temporal Attention Mechanism for Video Captioning
【24h】

STAT: Spatial-Temporal Attention Mechanism for Video Captioning

机译:STAT:视频字幕的时空注意机制

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Video captioning refers to automatic generate natural language sentences, which summarize the video contents. Inspired by the visual attention mechanism of human beings, temporal attention mechanism has been widely used in video description to selectively focus on important frames. However, most existing methods based on temporal attention mechanism suffer from the problems of recognition error and detail missing, because temporal attention mechanism cannot further catch significant regions in frames. In order to address above problems, we propose the use of a novel spatial-temporal attention mechanism (STAT) within an encoder-decoder neural network for video captioning. The proposed STAT successfully takes into account both the spatial and temporal structures in a video, so it makes the decoder to automatically select the significant regions in the most relevant temporal segments for word prediction. We evaluate our STAT on two well-known benchmarks: MSVD and MSR-VTT-10K. Experimental results show that our proposed STAT achieves the state-of-the-art performance with several popular evaluation metrics: BLEU-4, METEOR, and CIDEr.
机译:视频字幕是指自动生成自然语言的句子,用于总结视频内容。受人类视觉注意力机制的启发,时间注意力机制已广泛用于视频描述中以选择性地关注重要帧。然而,现有的大多数基于时间注意机制的方法都存在识别错误​​和细节缺失的问题,因为时间注意机制无法进一步捕捉帧中的重要区域。为了解决上述问题,我们提出在编码器-解码器神经网络内使用新型时空注意力机制(STAT)进行视频字幕。所提出的STAT成功地考虑了视频中的空间和时间结构,因此它使解码器自动选择最相关的时间段中的有效区域进行单词预测。我们根据两个著名的基准评估STAT:MSVD和MSR-VTT-10K。实验结果表明,我们提出的STAT具有几种流行的评估指标:BLEU-4,METEOR和CIDEr,可实现最先进的性能。

著录项

  • 来源
    《IEEE transactions on multimedia》 |2020年第1期|229-241|共13页
  • 作者

  • 作者单位

    Univ Sci & Technol China Sch Informat Sci & Technol Hefei 230026 Peoples R China|Hangzhou Dianzi Univ Inst Informat & Control Hangzhou 310018 Peoples R China;

    Hangzhou Dianzi Univ Inst Informat & Control Hangzhou 310018 Peoples R China;

    Shenzhen Univ Coll Mechatron & Control Engn Shenzhen 518060 Peoples R China;

    Tsinghua Univ Grad Sch Shenzhen Shenzhen 518055 Peoples R China;

    Beijing Inst Technol Sci & Technol Mechatron Dynam Control Lab Beijing 100081 Peoples R China;

    Univ Sci & Technol China Sch Informat Sci & Technol Hefei 230026 Peoples R China;

    Tsinghua Univ Dept Automat Beijing 100084 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Video captioning; spatial-temporal attention mechanism; encoder-decoder neural networks;

    机译:视频字幕;时空注意机制编码器-解码器神经网络;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号