STAT: Spatial-Temporal Attention Mechanism for Video Captioning

首页> 外文期刊>IEEE transactions on multimedia >STAT: Spatial-Temporal Attention Mechanism for Video Captioning

【24h】

STAT: Spatial-Temporal Attention Mechanism for Video Captioning

机译：STAT：视频字幕的时空注意机制

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Video captioning refers to automatic generate natural language sentences, which summarize the video contents. Inspired by the visual attention mechanism of human beings, temporal attention mechanism has been widely used in video description to selectively focus on important frames. However, most existing methods based on temporal attention mechanism suffer from the problems of recognition error and detail missing, because temporal attention mechanism cannot further catch significant regions in frames. In order to address above problems, we propose the use of a novel spatial-temporal attention mechanism (STAT) within an encoder-decoder neural network for video captioning. The proposed STAT successfully takes into account both the spatial and temporal structures in a video, so it makes the decoder to automatically select the significant regions in the most relevant temporal segments for word prediction. We evaluate our STAT on two well-known benchmarks: MSVD and MSR-VTT-10K. Experimental results show that our proposed STAT achieves the state-of-the-art performance with several popular evaluation metrics: BLEU-4, METEOR, and CIDEr.

机译：视频字幕是指自动生成自然语言的句子，用于总结视频内容。受人类视觉注意力机制的启发，时间注意力机制已广泛用于视频描述中以选择性地关注重要帧。然而，现有的大多数基于时间注意机制的方法都存在识别错误和细节缺失的问题，因为时间注意机制无法进一步捕捉帧中的重要区域。为了解决上述问题，我们提出在编码器-解码器神经网络内使用新型时空注意力机制（STAT）进行视频字幕。所提出的STAT成功地考虑了视频中的空间和时间结构，因此它使解码器自动选择最相关的时间段中的有效区域进行单词预测。我们根据两个著名的基准评估STAT：MSVD和MSR-VTT-10K。实验结果表明，我们提出的STAT具有几种流行的评估指标：BLEU-4，METEOR和CIDEr，可实现最先进的性能。

著录项

来源
《IEEE transactions on multimedia》 |2020年第1期|229-241|共13页
作者

展开▼
作者单位

Univ Sci & Technol China Sch Informat Sci & Technol Hefei 230026 Peoples R China|Hangzhou Dianzi Univ Inst Informat & Control Hangzhou 310018 Peoples R China;

Hangzhou Dianzi Univ Inst Informat & Control Hangzhou 310018 Peoples R China;

Shenzhen Univ Coll Mechatron & Control Engn Shenzhen 518060 Peoples R China;

Tsinghua Univ Grad Sch Shenzhen Shenzhen 518055 Peoples R China;

Beijing Inst Technol Sci & Technol Mechatron Dynam Control Lab Beijing 100081 Peoples R China;

Univ Sci & Technol China Sch Informat Sci & Technol Hefei 230026 Peoples R China;

Tsinghua Univ Dept Automat Beijing 100084 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Video captioning; spatial-temporal attention mechanism; encoder-decoder neural networks;

机译：视频字幕;时空注意机制编码器-解码器神经网络;

相似文献

外文文献
中文文献
专利

1. Corrections to “STAT: Spatial-Temporal Attention Mechanism for Video Captioning” [Jan 20 229-241] [J] . IEEE transactions on multimedia . 2020,第3期

机译：对“ STAT：视频字幕的时空注意机制”的更正[Jan 20 229-241]
2. Multimodal architecture for video captioning with memory networks and an attention mechanism [J] . Li Wei, Guo Dashan, Fang Xiangzhong Pattern recognition letters . 2018,第APRa1期

机译：具有存储网络的视频字幕多模式体系结构和一种注意机制
3. Capturing Temporal Structures for Video Captioning by Spatio-temporal Contexts and Channel Attention Mechanism [J] . Guo Dashan, Li Wei, Fang Xiangzhong Neural processing letters . 2017,第1期

机译：通过时空上下文和频道注意机制捕获视频字幕的时间结构
4. Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning [C] . Li Chu-yi, Yu Wei-yu 2018 international conference on image and video Processing, and artificial intelligence . 2018

机译：基于多种视频字幕功能的Bi-LSTM网络中的时空注意
5. Video and Image Super-Resolution via Deep Learning with Attention Mechanism [D] . Xu, Xuan. 2020

机译：通过深入学习的视频和图像超分辨率与注意机制
6. 5-HTTLPR polymorphism is linked to neural mechanisms of selective attention in preschoolers from lower socioeconomic status backgrounds [O] . Elif Isbell, Courtney Stevens, Amanda Hampton Wray, 2016

机译：5-HTTLPR多态性与来自较低社会经济地位背景的学龄前儿童的选择性注意的神经机制有关
7. A spatial-temporal approach for video caption detection and recognition [O] . Xiaoou Tang, Senior Member, Xinbo Gao, 2002

机译：视频字幕检测和识别的时空方法

STAT: Spatial-Temporal Attention Mechanism for Video Captioning

摘要

著录项

相似文献

相关主题

期刊订阅