首页> 外文会议>IEEE International Conference on Tools with Artificial Intelligence >Enhanced soft attention mechanism with an inception-like module for image captioning
【24h】

Enhanced soft attention mechanism with an inception-like module for image captioning

机译:具有用于图像字幕的成立模块的增强型软关注机制

获取原文

摘要

Visual soft attention has been widely adopted in image captioning models. Traditional Soft Attention Mechanism (TSAM) assigns a weight to a certain region by using a multilayer perceptron with input from its own features. As image classification networks extract regional features based on spatial locations, TSAM fails to adequately consider the spatial contexts of regions, which leads to unreasonable weight distribution. In this paper, we introduce a flexible and universal attention framework with an inception-like module, named Enhanced Soft Attention Mechanism (ESAM), which can balance the attention levels of adjacent regions and alleviate the problem caused by local features with weak representational ability. Furthermore, we add an LSTM to the attention module so that it can take into account the previous attention distribution while generating the current word. Experimental results show that our ESAM significantly surpasses the TSAM by 4.1% on BLEU-4 and 2.7% on CIDEr, and achieves better results when verifying universality under the same experimental setups.
机译:可视化软关注已广泛采用图像标题模型。传统的软关注机制(TSAM)通过使用从其自身特征的输入来分配给某个区域的权重。作为图像分类网络基于空间位置提取区域特征,TSAM未能充分考虑地区的空间环境,从而导致不合理的重量分布。在本文中,我们引入了一种灵活且通用的普遍关注框架,具有类似的成立模块,名为增强的软注意机制(ESAM),这可以平衡相邻地区的注意力水平,并减轻局部特征引起的问题较弱的代表能力。此外,我们向注意模块添加LSTM,以便在生成当前字的同时考虑先前的注意分布。实验结果表明,我们的ESAM在苹果酒上显着超过了4.1%的曲调4.1%,在验证相同实验设置下验证普遍性时达到更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号