【24h】

Captioning with Language-Based Attention

机译:基于语言的注意字幕

获取原文
获取外文期刊封面目录资料

摘要

The goal of image captioning via machine learning is to automatically learn to provide a free-form description of an image, while focusing on the significant objects in an image. Inspired by recent work on attention in image captioning, we study in this paper different attention mechanisms within a deep learning setting. In contrast to previous research on attention models which focus on applying attention to the image modality, we introduce three language-based attention models. These language-based attention models, which we developed iteratively from simpler RNN-and LSTM-based baseline models, consist of two sub-networks: a deep recurrent neural network for the language modality and a convolutional neural network for the image modality. The language-based attention models learn a joint representation of the language and image modalities, given the image and the previous words in the caption. At test time, novel captions are produced from this learned distribution. We provide a comparative quantitative and qualitative analysis of our three language-based attention models, which outperform the simple baseline models. We validate the effectiveness of our attention models with state-of-the-art performance on the Flickr8k dataset.
机译:通过机器学习对图像进行字幕的目的是自动学习提供图像的自由形式描述,同时关注图像中的重要对象。受近期有关图像字幕关注的研究启发,我们在深度学习环境中研究了不同的关注机制。与以往针对注意力模型的研究(专注于对图像模态的关注)相反,我们介绍了三种基于语言的注意力模型。这些基于语言的注意力模型是我们从基于RNN和LSTM的简单基准模型中迭代开发的,它由两个子网组成:用于语言模态的深度递归神经网络和用于图像模态的卷积神经网络。给定字幕中的图像和先前单词,基于语言的注意力模型将学习语言和图像模态的联合表示。在测试时,从该获悉的分发中产生了新颖的字幕。我们对三种基于语言的注意力模型进行了定量和定性的比较分析,这些模型优于简单的基线模型。我们使用Flickr8k数据集上的最新性能验证了注意力模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号