Enhanced soft attention mechanism with an inception-like module for image captioning

机译：具有用于图像字幕的成立模块的增强型软关注机制

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual soft attention has been widely adopted in image captioning models. Traditional Soft Attention Mechanism (TSAM) assigns a weight to a certain region by using a multilayer perceptron with input from its own features. As image classification networks extract regional features based on spatial locations, TSAM fails to adequately consider the spatial contexts of regions, which leads to unreasonable weight distribution. In this paper, we introduce a flexible and universal attention framework with an inception-like module, named Enhanced Soft Attention Mechanism (ESAM), which can balance the attention levels of adjacent regions and alleviate the problem caused by local features with weak representational ability. Furthermore, we add an LSTM to the attention module so that it can take into account the previous attention distribution while generating the current word. Experimental results show that our ESAM significantly surpasses the TSAM by 4.1% on BLEU-4 and 2.7% on CIDEr, and achieves better results when verifying universality under the same experimental setups.

机译：可视化软关注已广泛采用图像标题模型。传统的软关注机制（TSAM）通过使用从其自身特征的输入来分配给某个区域的权重。作为图像分类网络基于空间位置提取区域特征，TSAM未能充分考虑地区的空间环境，从而导致不合理的重量分布。在本文中，我们引入了一种灵活且通用的普遍关注框架，具有类似的成立模块，名为增强的软注意机制（ESAM），这可以平衡相邻地区的注意力水平，并减轻局部特征引起的问题较弱的代表能力。此外，我们向注意模块添加LSTM，以便在生成当前字的同时考虑先前的注意分布。实验结果表明，我们的ESAM在苹果酒上显着超过了4.1％的曲调4.1％，在验证相同实验设置下验证普遍性时达到更好的结果。

著录项

来源
《IEEE International Conference on Tools with Artificial Intelligence》|2020年|748-752|共5页
会议地点
作者
Zheng Lian; Haichang Li; Rui Wang; Xiaohui Hu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Conferences; Tools; Multilayer perceptrons; Feature extraction; Artificial intelligence; Image classification;

机译：可视化;会议;工具;多层情感;特征提取;人工智能;图像分类;

相似文献

外文文献
中文文献
专利

1. Image Captioning with Dense Fusion Connection and Improved Stacked Attention Module [J] . Zhu Hegui, Wang Ru, Zhang Xiangde Neural processing letters . 2021,第2期

机译：具有密集融合连接的图像标题和改进的堆叠注意力模块
2. Image caption generation with dual attention mechanism [J] . Information Processing & Management . 2020,第2期

机译：具有双重关注机制的图像字幕生成
3. Image captioning via hierarchical attention mechanism and policy gradient optimization [J] . Yan Shiyang, Xie Yuan, Wu Fangyu, Signal processing . 2020,第Feba期

机译：通过分级注意机制和策略梯度优化进行图像字幕
4. Image Emotion Caption Based on Visual Attention Mechanisms [C] . Bo Li, Yuqing Zhou, Hui Ren IEEE International Conference on Computer and Communications . 2020

机译：基于视觉注意机制的图像情感标题
5. Arabic Image Captioning Using Deep Learning with Attention [D] . Sabri, Sabri Monaf. 2021

机译：使用深入学习的阿拉伯语图像标题
6. Multi-U-Net: Residual Module under Multisensory Field and Attention Mechanism Based Optimized U-Net for VHR Image Semantic Segmentation [O] . Si Ran, Jianli Ding, Bohua Liu, 2021

机译：多型网：基于多思科领域的残差模块和基于VHR图像语义分割的优化U-Net
7. Application of Dual Attention Mechanism in Chinese Image Captioning [O] . Yong Zhang, Jing Zhang 2020

机译：双重关注机制在中文形象标题中的应用

Enhanced soft attention mechanism with an inception-like module for image captioning

摘要

著录项

相似文献

相关主题

期刊订阅