首页> 外文期刊>Signal processing >Image captioning via hierarchical attention mechanism and policy gradient optimization
【24h】

Image captioning via hierarchical attention mechanism and policy gradient optimization

机译:通过分级注意机制和策略梯度优化进行图像字幕

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the successful deep learning models, especially the CNN model and Long Short Term Memories (LSTMs) with attention mechanism, we propose a hierarchical attention model by utilizing both of the global CNN features and the local object features for more effective feature representation and reasoning in image captioning. The generative adversarial network (GAN), together with a reinforcement learning (RL) algorithm, is applied to solve the exposure bias problem in RNN-based supervised training for language problems. In addition, through the automatic measurement of the consistency between the generated caption and the image content by the discriminator in the GAN framework and RL optimization, we make the finally generated sentences more accurate and natural. Comprehensive experiments show the improved performance of the hierarchical attention mechanism and the effectiveness of our RL-based optimization method. Our model achieves state-of-the-art results on several important metrics in the MSCOCO dataset, using only greedy inference. (C) 2019 Elsevier B.V. All rights reserved.
机译:自动生成图像描述(即图像字幕)是人工智能中的重要且基本的主题,它弥合了计算机视觉与自然语言处理之间的鸿沟。基于成功的深度学习模型,尤其是带有注意机制的CNN模型和长期短期记忆(LSTM),我们通过利用全局CNN特征和局部对象特征来提出分层注意模型,以更有效地进行特征表示和推理在图像字幕中。生成对抗网络(GAN)与强化学习(RL)算法一起,用于解决基于RNN的语言问题监督训练中的暴露偏差问题。另外,通过在GAN框架中使用判别器自动测量生成的字幕和图像内容之间的一致性并进行RL优化,我们使最终生成的句子更加准确和自然。综合实验表明,分层注意力机制的性能得到了改善,并且我们基于RL的优化方法的有效性。我们的模型仅使用贪婪推断就可以在MSCOCO数据集中的多个重要指标上获得最新的结果。 (C)2019 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Signal processing》 |2020年第2期|107329.1-107329.12|共12页
  • 作者单位

    Queens Univ Belfast Sch Elect Elect Engn & Comp Sci Belfast Antrim North Ireland;

    Inst Adv Artificial Intelligence Nanjing Nanjing Jiangsu Peoples R China|Horizon Robot Beijing Peoples R China|Chinese Acad Sci Inst Automat Beijing Peoples R China|East China Normal Univ Sch Comp Sci & Software Engn Shanghai Peoples R China;

    Univ Liverpool Elect Engn & Elect Liverpool Merseyside England|Xian Jiaotong Liverpool Univ Dept Comp Sci & Software Engn Suzhou Peoples R China;

    Univ Liverpool Elect Engn & Elect Liverpool Merseyside England;

    Xian Jiaotong Liverpool Univ Dept Comp Sci & Software Engn Suzhou Peoples R China;

    Inst Adv Artificial Intelligence Nanjing Nanjing Jiangsu Peoples R China|Horizon Robot Beijing Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Image captioning; Hierarchical attention mechanism; Generative adversarial network; Reinforcement learning; Policy gradient;

    机译:图片字幕;分层注意机制;生成对抗网络;强化学习;政策梯度;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号