Integration of textual cues for fine-grained image captioning using deep CNN and LSTM

Gupta Neeraj; Jalal Anand Singh

首页> 外文期刊>Neural computing & applications >Integration of textual cues for fine-grained image captioning using deep CNN and LSTM

【24h】

Integration of textual cues for fine-grained image captioning using deep CNN and LSTM

机译：Integration of textual cues for fine-grained image captioning using deep CNN and LSTM

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

The automatic narration of a natural scene is an important trait in artificial intelligence that unites computer vision and natural language processing. Caption generation is a challenging task in scene understanding. Most of the state-of-the-art methods are using deep convolutional neural network models to extract visual features of the entire image, based on which the parallel structures between images and sentences are exploited using recurrent neural networks for image captioning. However, in such models, only visual features are exploited for caption generation. This work investigated that fusion of text available in an image can give more fined-grained captioning of a scene. In this paper, we have proposed a model which incorporates a deep convolutional neural network and long short-term memory to boost the accuracy of image captioning by fusing text feature available in an image with the visual features extracted in state-of-the-art methods. We have validated the effectiveness of the proposed model on the benchmark datasets (Flickr8k and Flickr30k). The experimental outcomes illustrate that the proposed model outperformed the state-of-the-art methods for image captioning.

著录项

来源
《Neural computing & applications》 |2020年第24期|17899-17908|共10页
作者
Gupta Neeraj; Jalal Anand Singh;
展开▼
作者单位

GLA Univ, Dept Comp Engn & Applicat, Mathura, Uttar Pradesh, India;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类人工神经网络计算机;人工智能理论;
关键词
Text saliency; Image captioning; Convolution neural network; Long short-term memory;

Integration of textual cues for fine-grained image captioning using deep CNN and LSTM

摘要

著录项

相关主题

期刊订阅