Image captioning via hierarchical attention mechanism and policy gradient optimization

Yan Shiyang; Xie Yuan; Wu Fangyu; Smith Jeremy S.; Lu Wenjin; Zhang Bailing

首页> 外文期刊>Signal processing >Image captioning via hierarchical attention mechanism and policy gradient optimization

【24h】

Image captioning via hierarchical attention mechanism and policy gradient optimization

机译：通过分级注意机制和策略梯度优化进行图像字幕

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the successful deep learning models, especially the CNN model and Long Short Term Memories (LSTMs) with attention mechanism, we propose a hierarchical attention model by utilizing both of the global CNN features and the local object features for more effective feature representation and reasoning in image captioning. The generative adversarial network (GAN), together with a reinforcement learning (RL) algorithm, is applied to solve the exposure bias problem in RNN-based supervised training for language problems. In addition, through the automatic measurement of the consistency between the generated caption and the image content by the discriminator in the GAN framework and RL optimization, we make the finally generated sentences more accurate and natural. Comprehensive experiments show the improved performance of the hierarchical attention mechanism and the effectiveness of our RL-based optimization method. Our model achieves state-of-the-art results on several important metrics in the MSCOCO dataset, using only greedy inference. (C) 2019 Elsevier B.V. All rights reserved.

机译：自动生成图像描述（即图像字幕）是人工智能中的重要且基本的主题，它弥合了计算机视觉与自然语言处理之间的鸿沟。基于成功的深度学习模型，尤其是带有注意机制的CNN模型和长期短期记忆（LSTM），我们通过利用全局CNN特征和局部对象特征来提出分层注意模型，以更有效地进行特征表示和推理在图像字幕中。生成对抗网络（GAN）与强化学习（RL）算法一起，用于解决基于RNN的语言问题监督训练中的暴露偏差问题。另外，通过在GAN框架中使用判别器自动测量生成的字幕和图像内容之间的一致性并进行RL优化，我们使最终生成的句子更加准确和自然。综合实验表明，分层注意力机制的性能得到了改善，并且我们基于RL的优化方法的有效性。我们的模型仅使用贪婪推断就可以在MSCOCO数据集中的多个重要指标上获得最新的结果。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Signal processing》 |2020年第2期|107329.1-107329.12|共12页
作者
Yan Shiyang; Xie Yuan; Wu Fangyu; Smith Jeremy S.; Lu Wenjin; Zhang Bailing;
展开▼
作者单位

Queens Univ Belfast Sch Elect Elect Engn & Comp Sci Belfast Antrim North Ireland;

Inst Adv Artificial Intelligence Nanjing Nanjing Jiangsu Peoples R China|Horizon Robot Beijing Peoples R China|Chinese Acad Sci Inst Automat Beijing Peoples R China|East China Normal Univ Sch Comp Sci & Software Engn Shanghai Peoples R China;

Univ Liverpool Elect Engn & Elect Liverpool Merseyside England|Xian Jiaotong Liverpool Univ Dept Comp Sci & Software Engn Suzhou Peoples R China;

Univ Liverpool Elect Engn & Elect Liverpool Merseyside England;

Xian Jiaotong Liverpool Univ Dept Comp Sci & Software Engn Suzhou Peoples R China;

Inst Adv Artificial Intelligence Nanjing Nanjing Jiangsu Peoples R China|Horizon Robot Beijing Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image captioning; Hierarchical attention mechanism; Generative adversarial network; Reinforcement learning; Policy gradient;

机译：图片字幕;分层注意机制;生成对抗网络;强化学习;政策梯度;

相似文献

外文文献
中文文献
专利

1. Tell and guess: cooperative learning for natural image caption generation with hierarchical refined attention [J] . Zhang Wenqiao, Tang Siliang, Su Jiajie, Multimedia Tools and Applications . 2021,第11期

机译：告诉和猜测：用于自然图像标题的合作学习，具有分层精致的注意力
2. A Hierarchical Multimodal Attention-based Neural Network for Image Captioning [J] . Yong Cheng, Fei Huang, Lian Zhou, ACM SIGIR FORUM . 2017,第cd期

机译：基于分层多模式注意力的神经网络的图像字幕
3. Image caption generation with dual attention mechanism [J] . Information Processing & Management . 2020,第2期

机译：具有双重关注机制的图像字幕生成
4. Improved Image Captioning via Policy Gradient optimization of SPIDEr [C] . Siqi Liu, Zhenhai Zhu, Ning Ye, IEEE International Conference on Computer Vision . 2017

机译：通过蜘蛛的政策梯度优化改进了图像标题
5. A new hierarchical multiscale optimization method: Gradient and non-gradient approaches for waterflooding optimization. [D] . Oliveira, Diego Felipe Barbosa de. 2014

机译：一种新的分层多尺度优化方法：用于水驱优化的梯度和非梯度方法。
6. Multi-U-Net: Residual Module under Multisensory Field and Attention Mechanism Based Optimized U-Net for VHR Image Semantic Segmentation [O] . Si Ran, Jianli Ding, Bohua Liu, 2021

机译：多型网：基于多思科领域的残差模块和基于VHR图像语义分割的优化U-Net
7. Improved Image Captioning via Policy Gradient optimization of SPIDEr [O] . Liu, Siqi, Zhu, Zhenhai, Ye, Ning, 2017

机译：通过spIDEr的策略梯度优化改进了图像标题

Image captioning via hierarchical attention mechanism and policy gradient optimization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅