Parallel-fusion LSTM with synchronous semantic and visual information for image captioning

Zhang Jing; Li Kangkang; Wang Zhe

首页> 外文期刊>Journal of visual communication & image representation >Parallel-fusion LSTM with synchronous semantic and visual information for image captioning

【24h】

Parallel-fusion LSTM with synchronous semantic and visual information for image captioning

机译：具有同步语义和图像字幕的可视信息的并行融合LSTM

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

For synchronously combining the dynamic semantic and visual information in the decoder part of image captioning, we propose a novel parallel-fusion LSTM (pLSTM) structure in this paper. Two parallel LSTMs with attributes and visual information of image are fused by the hidden states at every time step, which makes the attributes and visual information complementary or enhanced for generating more accurate captions. According to the different ways of integrating semantic information from attribute LSTM to visual LSTM, we propose two models pLSTM with attention (pLSTM-A) and pLSTM with guiding (pLSTM-G). pLSTM-A can automatically capture the crucial semantic and visual information to generate captions, and pLSTM-G directly adjusts the hidden state of visual LSTM by synchronous semantic information to the critical region. For verifying the effectiveness of our proposed pLSTM, we conduct a series of experiments on MSCOCO and Flickr30K datasets, and the experimental results outperform some state-of-the-art image captioning methods.

机译：为了同步组合图像标题的解码器部分中的动态语义和视觉信息，在本文中提出了一种新颖的并行融合LSTM（PLSTM）结构。具有属性的两个平行的LSTMS和图像的可视信息由隐藏状态融合在每次步骤中，这使得属性和可视信息互补或增强，以产生更准确的字幕。根据将语义信息从属性LSTM集成到Visual LSTM的不同方式，我们提出了两个模型PLSTM（PLSTM-A）和带引导（PLSTM-G）的PLSTM。 PLSTM-A可以自动捕获至关重要的语义和可视信息以生成标题，并且PLSTM-G通过同步语义信息直接将视觉LSTM的隐藏状态调整为关键区域。为了验证我们拟议的PLSTM的有效性，我们对MSCOCO和FLICKR30K数据集进行一系列实验，实验结果优于一些最先进的图像标题方法。

著录项

来源
《Journal of visual communication & image representation》 |2021年第2期|103044.1-103044.9|共9页
作者
Zhang Jing; Li Kangkang; Wang Zhe;
展开▼
作者单位

East China Univ Sci & Technol Dept Comp Sci & Engn Shanghai Peoples R China;

East China Univ Sci & Technol Dept Comp Sci & Engn Shanghai Peoples R China;

East China Univ Sci & Technol Dept Comp Sci & Engn Shanghai Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image captioning; Parallel-fusion LSTM; Attention mechanism; Guiding LSTM;

机译：图像标题;并行融合LSTM;注意机制;引导LSTM;

相似文献

外文文献
中文文献
专利

1. Hierarchical LSTMs with Adaptive Attention for Visual Captioning [J] . Gao Lianli, Li Xiangpeng, Song Jingkuan, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2020,第5期

机译：具有自适应关注的分层LSTMS对视觉标题
2. A neural image captioning model with caption-to-images semantic constructor [J] . Su Jinsong, Tang Jialong, Lu Ziyao, Neurocomputing . 2019,第Nova20期

机译：具有字幕到图像语义构造函数的神经图像字幕模型
3. DAA: Dual LSTMs with adaptive attention for image captioning [J] . Xiao Fen, Gong Xue, Zhang Yiming, Neurocomputing . 2019,第Octa28期

机译：DAA：具有自适应注意力的双重LSTM用于图像字幕
4. A parallel-fusion RNN-LSTM architecture for image caption generation [C] . Minsi Wang, Li Song, Xiaokang Yang, IEEE International Conference on Image Processing . 2016

机译：用于图像字幕生成的并行融合RNN-LSTM体系结构
5. Visual Semantic Complex Network for Web Images [D] . Qiu, Shi 2014

机译：Web图像的视觉语义复杂网络
6. Automated Semantic Indexing of Figure Captions to Improve Radiology Image Retrieval [O] . Charles E. Kahn Jr., Daniel L. Rubin 2009

机译：图形字幕的自动语义索引可改善放射图像的检索
7. phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning [O] . Tan, Ying Hua, Chan, Chee Seng 2017

机译：phi-LsTm：基于短语的图像字幕分层LsTm模型

Parallel-fusion LSTM with synchronous semantic and visual information for image captioning

摘要

著录项

相似文献

相关主题

期刊订阅