Gated Hierarchical Attention for Image Captioning

机译：图像字幕的门控分层注意

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Attention modules connecting encoder and decoders have been widely applied in the field of object recognition, image captioning, visual question answering and neural machine translation, and significantly improves the performance. In this paper, we propose a bottom-up gated hierarchical attention (GHA) mechanism for image captioning. Our proposed model employs a CNN as the decoder which is able to learn different concepts at different layers, and apparently, different concepts correspond to different areas of an image. Therefore, we develop the GHA in which low-level concepts are merged into high-level concepts and simultaneously low-level attended features pass to the top to make predictions. Our GHA significantly improves the performance of the model that only applies one level attention, e.g., the CIDEr score increases from 0.923 to 0.999, which is comparable to the state-of-the-art models that employ attributes boosting and reinforcement learning (RL). We also conduct extensive experiments to analyze the CNN decoder and our proposed GHA, and we find that deeper decoders cannot obtain better performance, and when the convolutional decoder becomes deeper the model is likely to collapse during training.

机译：连接编码器和解码器的注意力模块已广泛应用于对象识别，图像字幕，视觉问题解答和神经机器翻译领域，并显着提高了性能。在本文中，我们提出了一种自底向上的门控分层注意（GHA）机制来进行图像字幕。我们提出的模型采用CNN作为解码器，它能够在不同层学习不同的概念，而且显然，不同的概念对应于图像的不同区域。因此，我们开发了GHA，其中将低层次的概念合并为高层次的概念，同时将低层次的有人参与功能传递到顶部进行预测。我们的GHA显着提高了仅关注一个级别的模型的性能，例如CIDEr分数从0.923提高到0.999，这与采用属性增强和强化学习（RL）的最新模型相当。我们还进行了广泛的实验来分析CNN解码器和我们提出的GHA，我们发现更深的解码器无法获得更好的性能，并且当卷积解码器变得更深时，模型可能会在训练过程中崩溃。

著录项

来源
《Asian Conference on Computer Vision》|2018年|21-37|共17页
会议地点
作者
Qingzhong Wang; Antoni B. Chan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hierarchical attention; Image captioning; Convolutional decoder;

机译：分层关注;图片字幕;卷积解码器;

相似文献

外文文献
中文文献
专利

1. GateCap: Gated spatial and semantic attention model for image captioning [J] . Shiwei Wang, Long Lan, Xiang Zhang, Multimedia Tools and Applications . 2020,第17a18期

机译：GATECAP：图像标题的门间空间和语义关注模型
2. Tell and guess: cooperative learning for natural image caption generation with hierarchical refined attention [J] . Zhang Wenqiao, Tang Siliang, Su Jiajie, Multimedia Tools and Applications . 2021,第11期

机译：告诉和猜测：用于自然图像标题的合作学习，具有分层精致的注意力
3. Image captioning via hierarchical attention mechanism and policy gradient optimization [J] . Yan Shiyang, Xie Yuan, Wu Fangyu, Signal processing . 2020,第Feba期

机译：通过分级注意机制和策略梯度优化进行图像字幕
4. Gated Hierarchical Attention for Image Captioning [C] . Qingzhong Wang, Antoni B. Chan Asian Conference on Computer Vision . 2019

机译：用于图像标题的门级关注
5. Arabic Image Captioning Using Deep Learning with Attention [D] . Sabri, Sabri Monaf. 2021

机译：使用深入学习的阿拉伯语图像标题
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. Multi-Gate Attention Network for Image Captioning [O] . Weitao Jiang, Xiying Li, Haifeng Hu, 2021

机译：用于图像标题的多栅极注意网络

Gated Hierarchical Attention for Image Captioning

摘要

著录项

相似文献

相关主题

期刊订阅