首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >A Sequential Guiding Network with Attention for Image Captioning
【24h】

A Sequential Guiding Network with Attention for Image Captioning

机译:注意图像字幕的顺序引导网络

获取原文

摘要

The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images. In this challenge, the encoder-decoder framework has achieved promising performance when a convolutional neural network (CNN) is used as image encoder and a recurrent neural network (RNN) as decoder. In this paper, we introduce a sequential guiding network that guides the decoder during word generation. The new model is an extension of the encoder-decoder framework with attention that has an additional guiding long short-term memory (LSTM) and can be trained in an end-to-end manner by using image/descriptions pairs. We validate our approach by conducting extensive experiments on a benchmark dataset, i.e., MS COCO Captions. The proposed model achieves significant improvement comparing to the other state-of-the-art deep learning models.
机译:深度学习在计算机视觉(CV)和自然语言处理(NLP)中的最新进展为我们提供了一种理解语义的新方法,通过该方法,我们可以处理更具挑战性的任务,例如从自然图像自动生成描述。在这一挑战中,当使用卷积神经网络(CNN)作为图像编码器和递归神经网络(RNN)作为解码器时,编码器-解码器框架已实现了令人鼓舞的性能。在本文中,我们介绍了一种顺序引导网络,可在字生成过程中引导解码器。新模型是对编码器-解码器框架的扩展,并具有额外的指导性长期短期记忆(LSTM),可以通过使用图像/描述对以端对端的方式进行训练。我们通过对基准数据集(即MS COCO字幕)进行广泛的实验来验证我们的方法。与其他最新的深度学习模型相比,提出的模型实现了重大改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号