Hierarchical decoding with latent context for image captioning

Zhang Jing; Xie Yingshuai; Li KangkangWang ZheDu Wen

首页> 外文期刊>Neural computing & applications >Hierarchical decoding with latent context for image captioning

【24h】

Hierarchical decoding with latent context for image captioning

机译：Hierarchical decoding with latent context for image captioning

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Mining more rich visual features and analyzing the context information from image for decoding part has become a challenging problem in image captioning. Some recent works employ other knowledge bases to obtain the additional objects semantic relationships by constructing scene graph, which spend much time on pre-training scene graph and these artificial defined relationships may not be comprehensive. In this paper, a novel hierarchical decoding with latent context method is proposed for image captioning, which analyzes the visual context information and decodes multi-level visual features by a hierarchical decoding method to achieve more accurate caption words. In our proposed method, a novel Latent Context Generation Network (LCGN) is proposed to infer latent relationships between objects without any external knowledge, and meanwhile, a context vector which contains rich neighbor information for each object is constructed. Then a graph convolutional network with attention is used to further aggregate latent context information for achieving high-level context features by combining objects features and their context vectors. Finally, hierarchical decoding based on Triple Long Short-Term Memory (Tri-LSTM) is proposed to decode global features, local features and object features hierarchically, which gradually analyzes the content of the image from the whole to the local to the object. Experiments on MSCOCO dataset prove that our proposed method can achieve extremely competitive results in image captioning and outperform most CNN-RNN architecture methods.

著录项

来源
《Neural computing & applications》 |2023年第3期|2429-2442|共14页
作者
Zhang Jing; Xie Yingshuai; Li KangkangWang ZheDu Wen;
展开▼
作者单位

East China Univ Sci & Technol;

DS Informat Technol Co Ltd;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类人工神经网络计算机;人工智能理论;
关键词
Image captioning; Visual context; Latent context generation; Hierarchical decoding; NETWORK;

Hierarchical decoding with latent context for image captioning

摘要

著录项

相关主题

期刊订阅