首页> 外文期刊>World Wide Web >Residual attention-based LSTM for video captioning
【24h】

Residual attention-based LSTM for video captioning

机译:基于残留注意力的LSTM用于视频字幕

获取原文
获取原文并翻译 | 示例
           

摘要

Recently great success has been achieved by proposing a framework with hierarchical LSTMs in video captioning, such as stacked LSTM networks. When deeper LSTM layers are able to start converging, a degradation problem has been exposed. With the number of LSTM layers increasing, accuracy gets saturated and then degrades rapidly like standard deep convolutional networks such as VGG. In this paper, we propose a novel attention-based framework, namely Residual Attention-based LSTM (Res-ATT), which not only takes advantage of existing attention mechanism but also considers the importance of sentence internal information which usually gets lost in the transmission process. Our key novelty is that we show how to integrate residual mapping into a hierarchical LSTM network to solve the degradation problem. More specifically, our novel hierarchical architecture builds on two LSTMs layers and residual mapping is introduced to avoid the loss of previous generated words information (i.e., both content information and relationship information). Experimental results on the mainstream datasets: MSVD and MSR-VTT, which shows that our framework outperforms the state-of-the-art approaches. Furthermore, our automatically generated sentences can provide more detailed information to precisely describe a video.
机译:最近,通过在视频字幕中提出带有分层LSTM的框架(例如堆叠LSTM网络)已经取得了巨大的成功。当更深的LSTM层能够开始收敛时,就会出现降级问题。随着LSTM层数的增加,精度会达到饱和,然后像标准深层卷积网络(例如VGG)一样迅速下降。在本文中,我们提出了一个新颖的基于注意力的框架,即基于残余注意力的LSTM(Res-ATT),该框架不仅利用了现有的注意力机制,而且还考虑了通常在传输中丢失的句子内部信息的重要性。处理。我们的关键新颖之处在于,我们展示了如何将残差映射集成到分层LSTM网络中以解决降级问题。更具体地说,我们新颖的分层体系结构建立在两个LSTM层上,并引入了残差映射,以避免丢失先前生成的单词信息(即内容信息和关系信息)。在主流数据集:MSVD和MSR-VTT上的实验结果表明,我们的框架优于最新方法。此外,我们自动生成的句子可以提供更详细的信息,以准确地描述视频。

著录项

  • 来源
    《World Wide Web》 |2019年第2期|621-636|共16页
  • 作者单位

    Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Sichuan, Peoples R China;

    Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Sichuan, Peoples R China;

    Beijing Afanti Technol Co LTD, Beijing, Peoples R China;

    Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Sichuan, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    LSTM; Attention mechanism; Residual thought; Video captioning;

    机译:LSTM;注意机制;剩余思维;视频字幕;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号