首页> 外文期刊>Neurocomputing >DAA: Dual LSTMs with adaptive attention for image captioning
【24h】

DAA: Dual LSTMs with adaptive attention for image captioning

机译:DAA:具有自适应注意力的双重LSTM用于图像字幕

获取原文
获取原文并翻译 | 示例

摘要

Image captioning enables people to better understand images through fine-grained analysis. Recently the encoder-decoder architecture with attention mechanism has achieved great achievements in image captioning and visual question answering. In this paper, we propose a new captioning algorithm that integrates two separate LSTM (Long-short Term Memory) networks through an adaptive semantic attention model. Within our approach, the first LSTM network is followed by an attention model, which serves as a visual sentinel can flexibly make a trade off between the visual semantic region and textual content. Another LSTM is used as a language model, which combines the hidden state representation of the first LSTM and attention context vector, then outputs the word sequence. The proposed model has been extensively evaluated on two large-scale datasets: MSCOCO and Flickr30k. Experimental results show that the proposed method pays more attention to visual salient regions and achieves significant performance of prior state-of-the-art approaches on multiple evaluation metrics. (C) 2019 Elsevier B.V. All rights reserved.
机译:图像字幕使人们可以通过细粒度的分析更好地理解图像。最近,具有注意力机制的编码器-解码器体系结构在图像字幕和视觉问题解答方面取得了巨大成就。在本文中,我们提出了一种新的字幕算法,该算法通过自适应语义注意模型集成了两个单独的LSTM(长期记忆)网络。在我们的方法中,第一个LSTM网络后面是一个注意力模型,该模型用作视觉标记,可以灵活地在视觉语义区域和文本内容之间进行权衡。另一个LSTM用作语言模型,它结合了第一个LSTM的隐藏状态表示形式和注意上下文向量,然后输出单词序列。所提出的模型已在两个大型数据集:MSCOCO和Flickr30k上进行了广泛评估。实验结果表明,该方法更加关注视觉显着区域,并且在多个评估指标上均实现了现有技术水平的显着提高。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号