A Sequential Guiding Network with Attention for Image Captioning

机译：注意图像字幕的顺序引导网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images. In this challenge, the encoder-decoder framework has achieved promising performance when a convolutional neural network (CNN) is used as image encoder and a recurrent neural network (RNN) as decoder. In this paper, we introduce a sequential guiding network that guides the decoder during word generation. The new model is an extension of the encoder-decoder framework with attention that has an additional guiding long short-term memory (LSTM) and can be trained in an end-to-end manner by using image/descriptions pairs. We validate our approach by conducting extensive experiments on a benchmark dataset, i.e., MS COCO Captions. The proposed model achieves significant improvement comparing to the other state-of-the-art deep learning models.

机译：深度学习在计算机视觉（CV）和自然语言处理（NLP）中的最新进展为我们提供了一种理解语义的新方法，通过该方法，我们可以处理更具挑战性的任务，例如从自然图像自动生成描述。在这一挑战中，当使用卷积神经网络（CNN）作为图像编码器和递归神经网络（RNN）作为解码器时，编码器-解码器框架已实现了令人鼓舞的性能。在本文中，我们介绍了一种顺序引导网络，可在字生成过程中引导解码器。新模型是对编码器-解码器框架的扩展，并具有额外的指导性长期短期记忆（LSTM），可以通过使用图像/描述对以端对端的方式进行训练。我们通过对基准数据集（即MS COCO字幕）进行广泛的实验来验证我们的方法。与其他最新的深度学习模型相比，提出的模型实现了重大改进。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|3802-3806|共5页
会议地点
作者
Daouda Sow; Zengchang Qin; Mouhamed Niasse; Tao Wan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Semantics; Decoding; Task analysis; Adaptation models; Computational modeling; Visualization; Recurrent neural networks;

机译：语义;解码;任务分析;适应模型;计算模型;可视化;递归神经网络;

相似文献

外文文献
中文文献
专利

1. High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention [J] . Zhang Zongjian, Wu Qiang, Wang Yang, IEEE transactions on multimedia . 2019,第7期

机译：具有细粒度和语义引导的视觉注意的高质量图像字幕
2. Geospatial relation captioning for high-spatial-resolution images by using an attention-based neural network [J] . Chen Jie, Han Yarong, Wan Li, International journal of remote sensing . 2019,第15a16期

机译：使用基于注意力的神经网络对高空间分辨率图像进行地理空间关系字幕
3. Geospatial relation captioning for high-spatial-resolution images by using an attention-based neural network [J] . Chen Jie, Han Yarong, Wan Li, International journal of remote sensing . 2019,第15a16期

机译：使用基于注意力的神经网络的高空间分辨率图像的地理空间关系标题
4. A Sequential Guiding Network with Attention for Image Captioning [C] . Daouda Sow, Zengchang Qin, Mouhamed Niasse, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：一种序贯指导网络，注意图像标题
5. Arabic Image Captioning Using Deep Learning with Attention [D] . Sabri, Sabri Monaf. 2021

机译：使用深入学习的阿拉伯语图像标题
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. Fine-Grained and Semantic-Guided Visual Attention for Image Captioning [O] . Zongjian Zhang, Qiang Wu, Yang Wang, 2018

机译：用于图像标题的细粒度和语义引导的视觉关注

A Sequential Guiding Network with Attention for Image Captioning

摘要

著录项

相似文献

相关主题

期刊订阅