首页> 外文会议>IEEE International Conference on Multimedia and Expo >Refining Attention: A Sequential Attention Model for Image Captioning
【24h】

Refining Attention: A Sequential Attention Model for Image Captioning

机译:提炼注意力:图像字幕的顺序注意力模型

获取原文

摘要

Visual attention is widely applied to image captioning. Previous works put visual attention and linguistic word into a long short-term memory network together, but neglect the sequential relation of attention at different time steps during word prediction. Moreover, the abstraction degree of visual attention is usually different from that of linguistic word. To address these issues, a sequential attention model is proposed in this work to handle visual attention by considering the corresponding sequential relation, and hence the internal relation among attention at each word prediction step is well utilized to enhance the visual information during sentence decoding. The experimental results on the benchmark MSCOCO and Flickr30K datasets show that the proposed model achieves excellent performances with 108.1 and 34.9 respectively on the evaluation criteria of CIDEr and BLEU-4 for MSCOCO.
机译:视觉注意力已广泛应用于图像字幕。先前的作品将视觉注意力和语言单词整合到一个长的短期记忆网络中,但是忽略了单词预测过程中不同时间步长的注意力顺序关系。此外,视觉注意的抽象程度通常与语言单词的抽象程度不同。为了解决这些问题,在本文中提出了一种顺序注意模型,通过考虑相应的顺序关系来处理视觉注意,因此在每个单词预测步骤中注意之间的内部关系被很好地利用来增强句子解码过程中的视觉信息。在基准MSCOCO和Flickr30K数据集上的实验结果表明,所提出的模型在CIDEr和BLEU-4的MSCOCO评估标准上分别具有108.1和34.9的优异性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号