A PARALLEL-FUSION RNN-LSTM ARCHITECTURE FOR IMAGE CAPTION GENERATION

机译：用于图像字幕生成的并行融合RNN-LSTM架构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The models based on deep convolutional networks and recurrent neural networks have dominated in recent image caption generation tasks. Performance and complexity are still eternal topic. Inspired by recent work, by combining the advantages of simple RNN and LSTM, we present a novel parallel-fusion RNN-LSTM architecture, which obtains better results than a dominated one and improves the efficiency as well. The proposed approach divides the hidden units of RNN into several same-size parts, and lets them work in parallel. Then, we merge their outputs with corresponding ratios to generate final results. Moreover, these units can be different types of RNNs, for instance, a simple RNN and a LSTM. By training normally using NeuralTalk platform on Flickr8k dataset, without additional training data, we get better results than that of dominated structure and particularly, the proposed model surpass GoogleNIC in image caption generation.

机译：基于深度卷积网络和经常性神经网络的模型在最近的图像字幕生成任务中占主导地位。性能和复杂性仍然是永恒主题。灵感来自最近的工作，通过组合简单的RNN和LSTM的优点，我们提出了一种新颖的并行融合RNN-LSTM架构，它比主导地位获得更好的结果，并提高了效率。所提出的方法将RNN的隐藏单元划分为几个相同大小的部件，并让它们并行工作。然后，我们将它们的输出合并为相应的比率来生成最终结果。此外，这些单元可以是不同类型的RNN，例如，简单的RNN和LSTM。通过通常在FlickR8K数据集上使用NeuralTalk平台训练，没有额外的培训数据，我们得到比主导结构更好的结果，特别是，所提出的模型超越图像标题的歌曲。

著录项

来源
《IEEE International Conference on Image Processing》|2016年|3857-4512p|共5页
会议地点
作者
Minsi Wang; Li Song; Xiaokang Yang; Chuanfei Luo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词
Image captioning; Deep neural network; RNN; LSTM;

机译：图像标题;深神经网络;RNN;LSTM;
入库时间 2022-08-21 04:28:00

相似文献

外文文献
中文文献
专利

1. Parallel-fusion LSTM with synchronous semantic and visual information for image captioning [J] . Zhang Jing, Li Kangkang, Wang Zhe Journal of visual communication & image representation . 2021,第Feba期

机译：具有同步语义和图像字幕的可视信息的并行融合LSTM
2. Novel model to integrate word embeddings and syntactic trees for automatic caption generation from images [J] . Soft computing: A fusion of foundations, methodologies and applications . 2020,第2期

机译：从图像中集成Word Embeddings和Syntactic树的小说模型
3. Image caption generation with high-level image features [J] . Ding Songtao, Qu Shiru, Xi Yuling, Pattern recognition letters . 2019,第MAY期

机译：具有高级图像功能的图像标题生成
4. A parallel-fusion RNN-LSTM architecture for image caption generation [C] . Minsi Wang, Li Song, Xiaokang Yang, IEEE International Conference on Image Processing . 2016

机译：用于图像字幕生成的并行融合RNN-LSTM体系结构
5. Generation of Humorous Caption for Cartoon Images Using Deep Learning [D] . Shanmuga Sundaram, Rajesh. 2018

机译：使用深度学习的卡通形象的幽默标题
6. An Overview of Image Caption Generation Methods [O] . Haoran Wang, Yue Zhang, Xiaosheng Yu 2020

机译：图像字幕生成方法概述
7. Automatic Sentence Generation for Images via Key-phrase Estimation using Large-Scale Captioned Images [O] . 牛久祥孝 2014

机译：通过使用大型字幕图像的关键词短语估计自动生成图像的句子

A PARALLEL-FUSION RNN-LSTM ARCHITECTURE FOR IMAGE CAPTION GENERATION

摘要

著录项

相似文献

相关主题

期刊订阅