DAA: Dual LSTMs with adaptive attention for image captioning

Xiao Fen; Gong Xue; Zhang Yiming; Shen Yanqing; Li Jun; Gao Xieping

首页> 外文期刊>Neurocomputing >DAA: Dual LSTMs with adaptive attention for image captioning

【24h】

DAA: Dual LSTMs with adaptive attention for image captioning

机译：DAA：具有自适应注意力的双重LSTM用于图像字幕

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Image captioning enables people to better understand images through fine-grained analysis. Recently the encoder-decoder architecture with attention mechanism has achieved great achievements in image captioning and visual question answering. In this paper, we propose a new captioning algorithm that integrates two separate LSTM (Long-short Term Memory) networks through an adaptive semantic attention model. Within our approach, the first LSTM network is followed by an attention model, which serves as a visual sentinel can flexibly make a trade off between the visual semantic region and textual content. Another LSTM is used as a language model, which combines the hidden state representation of the first LSTM and attention context vector, then outputs the word sequence. The proposed model has been extensively evaluated on two large-scale datasets: MSCOCO and Flickr30k. Experimental results show that the proposed method pays more attention to visual salient regions and achieves significant performance of prior state-of-the-art approaches on multiple evaluation metrics. (C) 2019 Elsevier B.V. All rights reserved.

机译：图像字幕使人们可以通过细粒度的分析更好地理解图像。最近，具有注意力机制的编码器-解码器体系结构在图像字幕和视觉问题解答方面取得了巨大成就。在本文中，我们提出了一种新的字幕算法，该算法通过自适应语义注意模型集成了两个单独的LSTM（长期记忆）网络。在我们的方法中，第一个LSTM网络后面是一个注意力模型，该模型用作视觉标记，可以灵活地在视觉语义区域和文本内容之间进行权衡。另一个LSTM用作语言模型，它结合了第一个LSTM的隐藏状态表示形式和注意上下文向量，然后输出单词序列。所提出的模型已在两个大型数据集：MSCOCO和Flickr30k上进行了广泛评估。实验结果表明，该方法更加关注视觉显着区域，并且在多个评估指标上均实现了现有技术水平的显着提高。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Neurocomputing》 |2019年第28期|322-329|共8页
作者
Xiao Fen; Gong Xue; Zhang Yiming; Shen Yanqing; Li Jun; Gao Xieping;
展开▼
作者单位

Xiangtan Univ Minist Educ Key Lab Intelligent Comp & Informat Proc Xiangtan 411105 Peoples R China;

Xiangtan Univ Minist Educ Key Lab Intelligent Comp & Informat Proc Xiangtan 411105 Peoples R China|Xiangnan Univ Coll Software & Commun Engn Chenzhou 423043 Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Image captioning; Adaptive attention; Convolutional neural network; Long-short term memory;

机译：图片字幕;适应性注意;卷积神经网络长期记忆;

相似文献

外文文献
中文文献
专利

1. DAA: Dual LSTMs with adaptive attention for image captioning [J] . Xiao Fen, Gong Xue, Zhang Yiming, Neurocomputing . 2019,第OCTa28期

机译：DAA：具有自适应注意力的双重LSTM用于图像字幕
2. Constrained LSTM and Residual Attention for Image Captioning [J] . Yang Liang, Hu Haifeng, Xing Songlong, ACM transactions on multimedia computing communications and applications . 2020,第3期

机译：图像标题的限制LSTM和剩余注意力
3. Hierarchical LSTMs with Adaptive Attention for Visual Captioning [J] . Gao Lianli, Li Xiangpeng, Song Jingkuan, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2020,第5期

机译：具有自适应关注的分层LSTMS对视觉标题
4. Cascade Attention Fusion for Fine-Grained Image Captioning Based on Multi-Layer LSTM [C] . Shuang Wang, Yun Meng, Yu Gu, IEEE International Conference on Acoustics, Speech and Signal Processing . 2021

机译：基于多层LSTM的细粒度图像标题级联融合
5. Arabic Image Captioning Using Deep Learning with Attention [D] . Sabri, Sabri Monaf. 2021

机译：使用深入学习的阿拉伯语图像标题
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning [O] . Tan, Ying Hua, Chan, Chee Seng 2017

机译：phi-LsTm：基于短语的图像字幕分层LsTm模型

DAA: Dual LSTMs with adaptive attention for image captioning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅