Residual attention-based LSTM for video captioning

Li Xiangpeng; Zhou Zhilong; Chen Lijiang; Gao Lianli

首页> 外文期刊>World Wide Web >Residual attention-based LSTM for video captioning

【24h】

Residual attention-based LSTM for video captioning

机译：基于残留注意力的LSTM用于视频字幕

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently great success has been achieved by proposing a framework with hierarchical LSTMs in video captioning, such as stacked LSTM networks. When deeper LSTM layers are able to start converging, a degradation problem has been exposed. With the number of LSTM layers increasing, accuracy gets saturated and then degrades rapidly like standard deep convolutional networks such as VGG. In this paper, we propose a novel attention-based framework, namely Residual Attention-based LSTM (Res-ATT), which not only takes advantage of existing attention mechanism but also considers the importance of sentence internal information which usually gets lost in the transmission process. Our key novelty is that we show how to integrate residual mapping into a hierarchical LSTM network to solve the degradation problem. More specifically, our novel hierarchical architecture builds on two LSTMs layers and residual mapping is introduced to avoid the loss of previous generated words information (i.e., both content information and relationship information). Experimental results on the mainstream datasets: MSVD and MSR-VTT, which shows that our framework outperforms the state-of-the-art approaches. Furthermore, our automatically generated sentences can provide more detailed information to precisely describe a video.

机译：最近，通过在视频字幕中提出带有分层LSTM的框架（例如堆叠LSTM网络）已经取得了巨大的成功。当更深的LSTM层能够开始收敛时，就会出现降级问题。随着LSTM层数的增加，精度会达到饱和，然后像标准深层卷积网络（例如VGG）一样迅速下降。在本文中，我们提出了一个新颖的基于注意力的框架，即基于残余注意力的LSTM（Res-ATT），该框架不仅利用了现有的注意力机制，而且还考虑了通常在传输中丢失的句子内部信息的重要性。处理。我们的关键新颖之处在于，我们展示了如何将残差映射集成到分层LSTM网络中以解决降级问题。更具体地说，我们新颖的分层体系结构建立在两个LSTM层上，并引入了残差映射，以避免丢失先前生成的单词信息（即内容信息和关系信息）。在主流数据集：MSVD和MSR-VTT上的实验结果表明，我们的框架优于最新方法。此外，我们自动生成的句子可以提供更详细的信息，以准确地描述视频。

著录项

来源
《World Wide Web》 |2019年第2期|621-636|共16页
作者
Li Xiangpeng; Zhou Zhilong; Chen Lijiang; Gao Lianli;
展开▼
作者单位

Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Sichuan, Peoples R China;

Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Sichuan, Peoples R China;

Beijing Afanti Technol Co LTD, Beijing, Peoples R China;

Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Sichuan, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
LSTM; Attention mechanism; Residual thought; Video captioning;

机译：LSTM;注意机制;剩余思维;视频字幕;

相似文献

外文文献
中文文献
专利

1. Video Captioning With Attention-Based LSTM and Semantic Consistency [J] . Lianli Gao, Zhao Guo, Hanwang Zhang, IEEE transactions on multimedia . 2017,第9期

机译：具有基于注意的LSTM和语义一致性的视频字幕
2. Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos [J] . IEEE transactions on multimedia . 2020,第3期

机译：深度多核卷积LSTM网络和基于注意力的视频机制
3. Attention-based spatial–temporal hierarchical ConvLSTM network for action recognition in videos [J] . Computer Vision, IET . 2019,第8期

机译：基于注意力的时空分层ConvLSTM网络，用于视频中的动作识别
4. Effect of Batch Normalization and Stacked LSTMs on Video Captioning [C] . Vishwanath Sarathi, Ajit Mujumdar, Dinesh Naik International Conference on Computing Methodologies and Communication . 2021

机译：批量标准化和堆叠LSTMS对视频字幕的影响
5. The effect of the use of videos captioning on English as a foreign language (EFL) on college students' language learning in Taiwan (China). [D] . Hwang, Yan-Ling. 2003

机译：在台湾（中国）使用视频字幕作为外语英语（EFL）对大学生语言学习的影响。
6. Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF [O] . Buzhou Tang, Xiaolong Wang, Jun Yan, 2019

机译：使用基于注意的CNN-LSTM-CRF在中文临床文本中进行实体识别
7. Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos [O] . Sebastian Agethen, Winston H. Hsu 2020

机译：深度多核卷积LSTM网络和基于关注的视频机制

Residual attention-based LSTM for video captioning

摘要

著录项

相似文献

相关主题

期刊订阅