首页> 外文期刊>Journal of visual communication & image representation >Parallel-fusion LSTM with synchronous semantic and visual information for image captioning
【24h】

Parallel-fusion LSTM with synchronous semantic and visual information for image captioning

机译:具有同步语义和图像字幕的可视信息的并行融合LSTM

获取原文
获取原文并翻译 | 示例
           

摘要

For synchronously combining the dynamic semantic and visual information in the decoder part of image captioning, we propose a novel parallel-fusion LSTM (pLSTM) structure in this paper. Two parallel LSTMs with attributes and visual information of image are fused by the hidden states at every time step, which makes the attributes and visual information complementary or enhanced for generating more accurate captions. According to the different ways of integrating semantic information from attribute LSTM to visual LSTM, we propose two models pLSTM with attention (pLSTM-A) and pLSTM with guiding (pLSTM-G). pLSTM-A can automatically capture the crucial semantic and visual information to generate captions, and pLSTM-G directly adjusts the hidden state of visual LSTM by synchronous semantic information to the critical region. For verifying the effectiveness of our proposed pLSTM, we conduct a series of experiments on MSCOCO and Flickr30K datasets, and the experimental results outperform some state-of-the-art image captioning methods.
机译:为了同步组合图像标题的解码器部分中的动态语义和视觉信息,在本文中提出了一种新颖的并行融合LSTM(PLSTM)结构。具有属性的两个平行的LSTMS和图像的可视信息由隐藏状态融合在每次步骤中,这使得属性和可视信息互补或增强,以产生更准确的字幕。根据将语义信息从属性LSTM集成到Visual LSTM的不同方式,我们提出了两个模型PLSTM(PLSTM-A)和带引导(PLSTM-G)的PLSTM。 PLSTM-A可以自动捕获至关重要的语义和可视信息以生成标题,并且PLSTM-G通过同步语义信息直接将视觉LSTM的隐藏状态调整为关键区域。为了验证我们拟议的PLSTM的有效性,我们对MSCOCO和FLICKR30K数据集进行一系列实验,实验结果优于一些最先进的图像标题方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号