Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory

Cao Pengfei; Yang Zhongyi; Sun Liang; Liang Yanchun; Yang Mary Qu; Guan Renchu

首页> 外文期刊>Neural processing letters >Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory

【24h】

Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory

机译：基于双向语义注意的长短期记忆引导图像字幕

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Automatically describing contents of an image using natural language has drawn much attention because it not only integrates computer vision and natural language processing but also has practical applications. Using an end-to-end approach, we propose a bidirectional semantic attention-based guiding of long short-term memory (Bag-LSTM) model for image captioning. The proposed model consciously refines image features from previously generated text. By fine-tuning the parameters of convolution neural networks, Bag-LSTM obtains more text-related image features via feedback propagation than other models. As opposed to existing guidance-LSTM methods which directly add image features into each unit of an LSTM block, our fine-tuned model dynamically leverages more text-conditional image features, acquired by the semantic attention mechanism, as guidance information. Moreover, we exploit bidirectional gLSTM as the caption generator, which is capable of learning long term relations between visual features and semantic information by making use of both historical and future contextual information. In addition, variations of the Bag-LSTM model are proposed in an effort to sufficiently describe high-level visual-language interactions. Experiments on the Flickr8k and MSCOCO benchmark datasets demonstrate the effectiveness of the model, as compared with the baseline algorithms, such as it is 51.2% higher than BRNN on CIDEr metric.

机译：使用自然语言自动描述图像的内容备受关注，因为它不仅集成了计算机视觉和自然语言处理功能，而且具有实际应用价值。使用端到端方法，我们提出了一种基于双向语义关注的长短期记忆（Bag-LSTM）模型指导，用于图像字幕。提出的模型有意识地从先前生成的文本中细化了图像特征。通过微调卷积神经网络的参数，与其他模型相比，Bag-LSTM通过反馈传播获得更多与文本相关的图像特征。与直接将图像特征添加到LSTM块的每个单元中的现有导航LSTM方法相反，我们的微调模型动态地利用了语义注意机制获取的更多文本条件图像特征作为指导信息。此外，我们利用双向gLSTM作为字幕生成器，它能够利用历史和未来的上下文信息来学习视觉特征与语义信息之间的长期关系。另外，提出了Bag-LSTM模型的变体，以充分描述高级视觉语言交互。与基准算法相比，在Flickr8k和MSCOCO基准数据集上进行的实验证明了该模型的有效性，例如在CIDEr指标上，该模型比BRNN高51.2％。

著录项

来源
《Neural processing letters》 |2019年第1期|103-119|共17页
作者
Cao Pengfei; Yang Zhongyi; Sun Liang; Liang Yanchun; Yang Mary Qu; Guan Renchu;
展开▼
作者单位

Jilin Univ, Coll Comp Sci & Technol, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China|Univ Chinese Acad Sci, Beijing 100049, Peoples R China|Chinese Acad Sci, Inst Automat, NLPR, Beijing 100190, Peoples R China;

Jilin Univ, Coll Comp Sci & Technol, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China;

Dalian Univ Technol, Coll Comp Sci & Technol, Dalian 116024, Peoples R China;

Jilin Univ, Coll Comp Sci & Technol, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China|Jilin Univ, Zhuhai Coll, Minist Educ, Zhuhai Lab,Key Lab Symbol Computat & Knowledge En, Zhuhai 519041, Peoples R China;

Univ Arkansas, MidSouth Bioinformat Ctr, Little Rock, AR 72204 USA|Univ Arkansas Little Rock & Univ Arkansas Med Sci, Joint Bioinformat PhD Program, Little Rock, AR 72204 USA;

Jilin Univ, Coll Comp Sci & Technol, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China|Jilin Univ, Zhuhai Coll, Minist Educ, Zhuhai Lab,Key Lab Symbol Computat & Knowledge En, Zhuhai 519041, Peoples R China|Univ Arkansas, MidSouth Bioinformat Ctr, Little Rock, AR 72204 USA|Univ Arkansas Little Rock & Univ Arkansas Med Sci, Joint Bioinformat PhD Program, Little Rock, AR 72204 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image captioning; Semantic attention mechanism; Convolution neural network; Bidirectional guiding LSTM;

机译：图像标题;语义关注机制;卷积神经网络;双向引导LSTM;

相似文献

外文文献
中文文献
专利

1. Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory [J] . Cao Pengfei, Yang Zhongyi, Sun Liang, Neural processing letters . 2019,第1期

机译：具有双向语义关注的长短期记忆的图像标题
2. Adaptive Attention-based High-level Semantic Introduction for Image Caption [J] . Liu Xiaoxiao, Xu Qingyang ACM transactions on multimedia computing communications and applications . 2020,第4期

机译：基于自适应的图像标题的高级语义介绍
3. Attention-Based Convolution Skip Bidirectional Long Short-Term Memory Network for Speech Emotion Recognition [J] . Huiyun Zhang, Heming Huang, Henry Han Quality Control, Transactions . 2021,第1期

机译：基于注意力的卷积跳过双向长期短期记忆网络，用于语音情感识别
4. Attention-based Bidirectional Long Short-Term Memory Networks for Relation Classification Using Knowledge Distillation from BERT [C] . Zihan Wang, Bo Yang International Conference on Dependable, Autonomic and Secure Computing;International Conference on Pervasive Intelligence and Computing;International Conference on Cloud and Big Data Computing;Cyber Science and Technology Congress . 2020

机译：基于注意力的双向双向长期短期记忆网络用于关系分类
5. Bidirectional Long Short-Term Memory Network for Proto-Object Representation [D] . Zhou, Quan. 2018

机译：双向长期内存网络，用于原型对象表示
6. Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks [O] . Canlin Zhang, Daniel Biś, Xiuwen Liu, 2019

机译：具有双向长期短期记忆和基于注意力的神经网络的生物医学单词义消歧
7. Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory [O] . Pengfei Cao, Zhongyi Yang, Liang Sun, 2019

机译：具有双向语义关注的长短期记忆的图像标题

Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅