首页> 外文期刊>Neural computing & applications >Hierarchical decoding with latent context for image captioning
【24h】

Hierarchical decoding with latent context for image captioning

机译:Hierarchical decoding with latent context for image captioning

获取原文
获取原文并翻译 | 示例
           

摘要

Mining more rich visual features and analyzing the context information from image for decoding part has become a challenging problem in image captioning. Some recent works employ other knowledge bases to obtain the additional objects semantic relationships by constructing scene graph, which spend much time on pre-training scene graph and these artificial defined relationships may not be comprehensive. In this paper, a novel hierarchical decoding with latent context method is proposed for image captioning, which analyzes the visual context information and decodes multi-level visual features by a hierarchical decoding method to achieve more accurate caption words. In our proposed method, a novel Latent Context Generation Network (LCGN) is proposed to infer latent relationships between objects without any external knowledge, and meanwhile, a context vector which contains rich neighbor information for each object is constructed. Then a graph convolutional network with attention is used to further aggregate latent context information for achieving high-level context features by combining objects features and their context vectors. Finally, hierarchical decoding based on Triple Long Short-Term Memory (Tri-LSTM) is proposed to decode global features, local features and object features hierarchically, which gradually analyzes the content of the image from the whole to the local to the object. Experiments on MSCOCO dataset prove that our proposed method can achieve extremely competitive results in image captioning and outperform most CNN-RNN architecture methods.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号