GateCap: Gated spatial and semantic attention model for image captioning

Shiwei Wang; Long Lan; Xiang Zhang; Zhigang Luo

首页> 外文期刊>Multimedia Tools and Applications >GateCap: Gated spatial and semantic attention model for image captioning

【24h】

GateCap: Gated spatial and semantic attention model for image captioning

机译：GATECAP：图像标题的门间空间和语义关注模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual attention has been widely used in deep image captioning models for its capacity of selectively aligning visual features to the corresponding words, i.e., the word-to-region alignment. In many cases, existing attention modules may not highlight task-related image regions for lack of high-level semantics. To advance captioning model, it is non-trivial for image captioning to effectively leverage high-level semantics. To defeat such issues, we propose a gated spatial and semantic attention captioning model (GateCap) which adap-tively fuses spatial attention features with semantic attention features to achieve this goal. In particular, GateCap brings into two novel aspects: 1) spatial and semantic attention features are further enhanced via triple LSTMs in a divide-and-fuse learning manner, and 2) a context gate module is explored to reweigh spatial and semantic attention features in a fair manner. Benefitting from them, GateCap could reduce the side effect of the word-to-region misalignment at a time step over subsequent word prediction, thereby possibly alleviating emergence of incorrect words during testing. Experiments on MSCOCO dataset verify the efficacy of the proposed GateCap model in terms of quantitative and qualitative results.

机译：视觉注意力已广泛用于深度图像标题模型，其能够选择性地将可视特征与相应的单词，即区域对齐方式对准。在许多情况下，现有的注意力模块可能不会突出显示与缺乏高级语义的任务相关的图像区域。为了推进标题模型，图像标题是有效利用高级语义的非琐碎。为了击败此类问题，我们提出了一个门控空间和语义关注标题模型（GATECAP），它适应了空间关注功能，具有语义关注功能来实现这一目标。特别地，GATECAP带入了两种新颖方面：1）空间和语义的关注特征通过分行和保险丝学习方式的三倍LSTM进一步增强，并且2）探索了上下文门模块以重量空间和语义关注功能一种公平的方式。受益于它们，GATECAP可以在随后的单词预测上缩短到区域字对区域错位的副作用，从而可能减轻了在测试期间减轻了不正确的单词的出现。 Mscoco DataSet的实验验证了所提出的GATECAP模型在定量和定性结果方面的功效。

著录项

来源
《Multimedia Tools and Applications》 |2020年第18期|11531-11549|共19页
作者
Shiwei Wang; Long Lan; Xiang Zhang; Zhigang Luo;
展开▼
作者单位

Science and Technology on Parallel and distributed Processing National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073. China;

College of Computer National University of Defense Technology Changsha 410073. China Institute for Quantum Information State Key Laboratory of High Performance Computing National University of Defense Technology Changsha. 410073 China;

College of Computer National University of Defense Technology Changsha 410073. China Institute for Quantum Information State Key Laboratory of High Performance Computing National University of Defense Technology Changsha. 410073 China;

Science and Technology on Parallel and distributed Processing National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073. China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Semantic attention; Spatial attention; Context gate;

机译：语义关注;空间注意;上下文门;

相似文献

外文文献
中文文献
专利

1. Modeling visual and word-conditional semantic attention for image captioning [J] . Wu Chunlei, Wei Yiwei, Chu Xiaoliang, Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2018,第期

机译：模拟图像标题的视觉和单词条件语义关注
2. A neural image captioning model with caption-to-images semantic constructor [J] . Su Jinsong, Tang Jialong, Lu Ziyao, Neurocomputing . 2019,第Nova20期

机译：具有字幕到图像语义构造函数的神经图像字幕模型
3. Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning [J] . Yuan Zhenghang, Li Xuelong, Wang Qi Quality Control, Transactions . 2020,第期

机译：探索遥感图像标题的多级关注和语义关系
4. Attention-gated LSTM for Image Captioning [C] . Cheng Xu, Junzhong Ji, Menglong Zhang, IEEE International Conference on Unmanned Systems and Artificial Intelligence . 2019

机译：注意门控LSTM用于图像字幕
5. Evolving Spatially Aggregated Features for Regional Modeling and its Application to Satellite Imagery [D] . Kriegman, Sam. 2016

机译：区域建模的空间聚集特征及其在卫星影像中的应用
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation [O] . Ling Cheng, Wei Wei, Xianling Mao, 2020

机译：Stack-VS：图像字幕生成的堆叠视觉语义关注

GateCap: Gated spatial and semantic attention model for image captioning

摘要

著录项

相似文献

相关主题

期刊订阅