Captioning with Language-Based Attention

机译：基于语言的注意字幕

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The goal of image captioning via machine learning is to automatically learn to provide a free-form description of an image, while focusing on the significant objects in an image. Inspired by recent work on attention in image captioning, we study in this paper different attention mechanisms within a deep learning setting. In contrast to previous research on attention models which focus on applying attention to the image modality, we introduce three language-based attention models. These language-based attention models, which we developed iteratively from simpler RNN-and LSTM-based baseline models, consist of two sub-networks: a deep recurrent neural network for the language modality and a convolutional neural network for the image modality. The language-based attention models learn a joint representation of the language and image modalities, given the image and the previous words in the caption. At test time, novel captions are produced from this learned distribution. We provide a comparative quantitative and qualitative analysis of our three language-based attention models, which outperform the simple baseline models. We validate the effectiveness of our attention models with state-of-the-art performance on the Flickr8k dataset.

机译：通过机器学习对图像进行字幕的目的是自动学习提供图像的自由形式描述，同时关注图像中的重要对象。受近期有关图像字幕关注的研究启发，我们在深度学习环境中研究了不同的关注机制。与以往针对注意力模型的研究（专注于对图像模态的关注）相反，我们介绍了三种基于语言的注意力模型。这些基于语言的注意力模型是我们从基于RNN和LSTM的简单基准模型中迭代开发的，它由两个子网组成：用于语言模态的深度递归神经网络和用于图像模态的卷积神经网络。给定字幕中的图像和先前单词，基于语言的注意力模型将学习语言和图像模态的联合表示。在测试时，从该获悉的分发中产生了新颖的字幕。我们对三种基于语言的注意力模型进行了定量和定性的比较分析，这些模型优于简单的基线模型。我们使用Flickr8k数据集上的最新性能验证了注意力模型的有效性。

著录项

来源
《IEEE International Conference on Data Science and Advanced Analytics》|2018年|415-423|共9页
会议地点
作者
Anshu Rajendra; Ritwik Rajendra; Ole J. Mengshoel; Ming Zeng; Momina Haider;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Recurrent neural networks; Computational modeling; Mathematical model; Training; Task analysis; Detectors;

机译：递归神经网络;计算建模;数学模型;训练;任务分析;检测器;

相似文献

外文文献
中文文献
专利

1. The synergy of double attention: Combine sentence-level and word-level attention for image captioning [J] . Haiyang Wei, Zhixin Li, Canlong Zhang, Computer vision and image understanding . 2020,第Deca期

机译：双重关注的协同作用：相结合句子水平和单词级别的图像标题
2. Image Captioning Using Region-Based Attention Joint with Time-Varying Attention [J] . Wang Weixuan, Hu Haifeng Neural processing letters . 2019,第1期

机译：使用基于区域的注意力联合时变注意力的图像字幕
3. Image Captioning Using Region-Based Attention Joint with Time-Varying Attention [J] . Wang Weixuan, Hu Haifeng Neural processing letters . 2019,第1期

机译：使用基于区域的注意力关节与时变关节的图像标题
4. Captioning with Language-Based Attention [C] . Anshu Rajendra, Ritwik Rajendra, Ole J. Mengshoel, IEEE International Conference on Data Science and Advanced Analytics . 2019

机译：基于语言的注意力标题
5. Arabic Image Captioning Using Deep Learning with Attention [D] . Sabri, Sabri Monaf. 2021

机译：使用深入学习的阿拉伯语图像标题
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map [O] . Boeun Kim, Saim Shin, Hyedong Jung 2019

机译：使用标题注意图的基于变化的自动统计器的多个图像标题

Captioning with Language-Based Attention

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅