Reference-based model using multimodal gated recurrent units for image captioning

Tiago do Carmo Nogueira; Cassio Dener Noronha Vinhal; Gelson da Cruz Junior; Matheus Rudolfo Diedrich Ullmann

首页> 外文期刊>Multimedia Tools and Applications >Reference-based model using multimodal gated recurrent units for image captioning

【24h】

Reference-based model using multimodal gated recurrent units for image captioning

机译：基于参考的模型，使用多模式门控复发单元进行图像标题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Describing images through natural language is a challenging task in the field of computer vision. Image captioning consists of creating image descriptions that can be accomplished via deep learning architectures that use convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, traditional RNNs encounter problems such as exploding and vanishing gradients, and they exhibit poor performance when generating non-descriptive sentences. To solve these issues, we proposed a model based on the encoder-decoder structure using CNNs to extract the image features and multimodal gated recurrent units (GRU) for descriptions. This model implements the part-of-speech (PoS) and likelihood function for weight generation in the GRU. The method performs knowledge transfer during a validation phase that uses the k-nearest neighbors technique (ANN). Experimental results using the Flickr30k and MSCOCO datasets demonstrated that the proposed PoS-based model presents competitive scores in comparison to state-of-the-art models. The system predicts more descriptive captions and closely approximates the expected captions both in the predicted and £NN selected captions.

机译：通过自然语言描述图像是计算机视野领域的具有挑战性的任务。图像字幕包括创建可以通过使用卷积神经网络（CNNS）和经常性神经网络（RNN）的深度学习架构来完成的图像描述。然而，传统的RNN遇到爆炸和消失梯度等问题，并且在产生非描述性句子时表现出较差的性能。为了解决这些问题，我们提出了一种基于编码器 - 解码器结构的模型，该模型使用CNN来提取图像特征和多模式门控复发单元（GRU）以进行描述。该模型实现了GRU中的重量生成的语音（POS）和似然函数。该方法在使用K-Collect邻居技术（ANN）的验证阶段期间执行知识传输。使用FlickR30K和MSCOCO数据集的实验结果表明，与最先进的模型相比，所提出的基于POS的模型具有竞争性分数。该系统预测更多描述性标题并与预测和£NN所选字幕的预期标题紧密地近似。

著录项

来源
《Multimedia Tools and Applications》 |2020年第42期|30615-30635|共21页
作者
Tiago do Carmo Nogueira; Cassio Dener Noronha Vinhal; Gelson da Cruz Junior; Matheus Rudolfo Diedrich Ullmann;
展开▼
作者单位

School of Electrical Mechanical and Computer Engineering (EMC) Federal University of Goias (UFG) Goiania Brazil;

School of Electrical Mechanical and Computer Engineering (EMC) Federal University of Goias (UFG) Goiania Brazil;

School of Electrical Mechanical and Computer Engineering (EMC) Federal University of Goias (UFG) Goiania Brazil;

School of Electrical Mechanical and Computer Engineering (EMC) Federal University of Goias (UFG) Goiania Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Gated recurrent units; Caption generation references; Convolutional neural network;

机译：门控经常性单位;标题生成参考;卷积神经网络;

相似文献

外文文献
中文文献
专利

1. GateCap: Gated spatial and semantic attention model for image captioning [J] . Shiwei Wang, Long Lan, Xiang Zhang, Multimedia Tools and Applications . 2020,第17a18期

机译：GATECAP：图像标题的门间空间和语义关注模型
2. Multi-modal gated recurrent units for image description [J] . Xuelong Li, Aihong Yuan, Xiaoqiang Lu Multimedia Tools and Applications . 2018,第22期

机译：多模态门控循环单元用于图像描述
3. Opportunistic maintenance for multi-unit series systems based on gated recurrent units prediction model [J] . Bi Luning, Tao Fei, Zhang Pengyuan, CIRP Annals . 2020,第1期

机译：基于门控复发单位预测模型的多单元系列系统的机会维护
4. A Novel Convolutional Neural Network-Gated Recurrent Unit approach for Image Captioning [C] . Sarthak Singh Rawat, Kartikeyan Singh Rawat, Rahul Nijhawan International Conference on Smart Systems and Inventive Technology . 2020

机译：一种新颖的卷积神经网络门控递归单元图像字幕
5. L1/L2 eye movement reading of closed captioning: A multimodal analysis of multimodal use. [D] . Specker, Elizabeth A. 2008

机译：隐藏式字幕的L1 / L2眼动读数：多模式使用的多模式分析。
6. Hybrid Deep Learning Predictor for Smart Agriculture Sensing Based on Empirical Mode Decomposition and Gated Recurrent Unit Group Model [O] . Xue-Bo Jin, Nian-Xiang Yang, Xiao-Yi Wang, 2020

机译：基于经验模态分解和门控递归单元组模型的智能农业感知混合深度学习预测器
7. Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit [O] . Al Momin Faruk, Hasan Al Faraby, Md. Muzahidul Azad, 2020

机译：使用深CNN和双向门控复发单元到孟加拉标题的图像

Reference-based model using multimodal gated recurrent units for image captioning

摘要

著录项

相似文献

相关主题

期刊订阅