首页> 外文会议>IEEE International Conference on Image Processing >A PARALLEL-FUSION RNN-LSTM ARCHITECTURE FOR IMAGE CAPTION GENERATION
【24h】

A PARALLEL-FUSION RNN-LSTM ARCHITECTURE FOR IMAGE CAPTION GENERATION

机译:用于图像字幕生成的并行融合RNN-LSTM架构

获取原文

摘要

The models based on deep convolutional networks and recurrent neural networks have dominated in recent image caption generation tasks. Performance and complexity are still eternal topic. Inspired by recent work, by combining the advantages of simple RNN and LSTM, we present a novel parallel-fusion RNN-LSTM architecture, which obtains better results than a dominated one and improves the efficiency as well. The proposed approach divides the hidden units of RNN into several same-size parts, and lets them work in parallel. Then, we merge their outputs with corresponding ratios to generate final results. Moreover, these units can be different types of RNNs, for instance, a simple RNN and a LSTM. By training normally using NeuralTalk platform on Flickr8k dataset, without additional training data, we get better results than that of dominated structure and particularly, the proposed model surpass GoogleNIC in image caption generation.
机译:基于深度卷积网络和经常性神经网络的模型在最近的图像字幕生成任务中占主导地位。性能和复杂性仍然是永恒主题。灵感来自最近的工作,通过组合简单的RNN和LSTM的优点,我们提出了一种新颖的并行融合RNN-LSTM架构,它比主导地位获得更好的结果,并提高了效率。所提出的方法将RNN的隐藏单元划分为几个相同大小的部件,并让它们并行工作。然后,我们将它们的输出合并为相应的比率来生成最终结果。此外,这些单元可以是不同类型的RNN,例如,简单的RNN和LSTM。通过通常在FlickR8K数据集上使用NeuralTalk平台训练,没有额外的培训数据,我们得到比主导结构更好的结果,特别是,所提出的模型超越图像标题的歌曲。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号