Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning

Xu Ning; Zhang Hanwang; Liu An-An; Nie Weizhi; Su Yuting; Nie Jie; Zhang Yongdong

首页> 外文期刊>IEEE transactions on multimedia >Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning

【24h】

Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning

机译：用于图像标题的多级政策和基于奖励的深度加强学习框架

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Image captioning is one of the most challenging tasks in AI because it requires an understanding of both complex visuals and natural language. Because image captioning is essentially a sequential prediction task, recent advances in image captioning have used reinforcement learning (RL) to better explore the dynamics of word-by-word generation. However, the existing RL-based image captioning methods rely primarily on a single policy network and reward function-an approach that is not well matched to the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To solve this problem, we propose a novel multi-level policy and reward RL framework for image captioning that can be easily integrated with RNN-based captioning models, language metrics, or visual-semantic functions for optimization. Specifically, the proposed framework includes two modules: 1) a multi-level policy network that jointly updates the word- and sentence-level policies for word generation; and 2) a multi-level reward function that collaboratively leverages both a vision-language reward and a language-language reward to guide the policy. Furthermore, we propose a guidance term to bridge the policy and the reward for RL optimization. The extensive experiments on the MSCOCO and Flickr30k datasets and the analyses show that the proposed framework achieves competitive performances on a variety of evaluation metrics. In addition, we conduct ablation studies on multiple variants of the proposed framework and explore several representative image captioning models and metrics for the word-level policy network and the language-language reward function to evaluate the generalization ability of the proposed framework.

机译：图像标题是AI中最具挑战性的任务之一，因为它需要了解复杂的视觉效果和自然语言。因为图像标题基本上是一个连续的预测任务，所以图像标题的最近进步已经使用了增强学习（RL）来更好地探索逐字生成的动态。然而，现有的基于RL的图像标题方法主要依赖于单个策略网络和奖励函数 - 一种与多级（Word和句）和多模态（视觉和语言）性质不太匹配的方法任务。为了解决这个问题，我们提出了一种新的多级策略，并奖励RL框架的图像标题，可以很容易地与基于RNN的标题模型，语言指标或可视语义功能进行优化。具体而言，所提出的框架包括两个模块：1）一个多级策略网络，共同更新Word生成的单词和句子级策略; 2）一个多级奖励功能，可以协作利用视觉语言奖励和语言奖励来指导政策。此外，我们提出了一项指导术语来弥合政策和RL优化的奖励。 MSCOCO和FLICKR30K数据集的广泛实验和分析表明，该框架在各种评估指标上实现了竞争性表演。此外，我们还对所提出的框架的多种变体进行消融研究，并探索单词级策略网络的几个代表性图像标题模型和度量，语言语言奖励功能来评估所提出的框架的泛化能力。

著录项

来源
《IEEE transactions on multimedia》 |2020年第5期|1372-1383|共12页
作者
Xu Ning; Zhang Hanwang; Liu An-An; Nie Weizhi; Su Yuting; Nie Jie; Zhang Yongdong;
展开▼
作者单位

Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China;

Nanyang Technol Univ Sch Comp Sci & Engn Singapore 639798 Singapore;

Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China;

Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China;

Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China;

Ocean Univ China Coll Informat Sci & Engn Qingdao 266100 Peoples R China;

Univ Sci & Technol China Hefei 230027 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Visualization; Measurement; Task analysis; Reinforcement learning; Optimization; Adaptation models; Semantics; Multi-level policy; multi-level reward; reinforcement learning; image captioning;

机译：可视化;测量;任务分析;加强学习;优化;适应模型;语义;多级政策;多级奖励;加强学习;图像标题;

相似文献

外文文献
中文文献
专利

1. A Hindi Image Caption Generation Framework Using Deep Learning [J] . Mishra Santosh Kumar, Dhir Rijul, Saha Sriparna, ACM transactions on Asian and low-resource language information processing . 2021,第2期

机译：使用深度学习的印地语图像字幕生成框架
2. Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning [J] . Zhi-xiong XU, Lei CAO, Xi-liang CHEN, IEICE transactions on information and systems . 2018,第9期

机译：基于奖励的探索：深度强化学习的自适应控制
3. Optimal policy for structure maintenance: A deep reinforcement learning framework [J] . Wei Shiyin, Bao Yuequan, Li Hui Structural Safety . 2020,第期

机译：结构维护最优政策：深度加强学习框架
4. Multi-Level Policy and Reward Reinforcement Learning for Image Captioning [C] . An-An Liu, Ning Xu, Hanwang Zhang, International Joint Conference on Artificial Intelligence . 2018

机译：图像标题的多级政策和奖励加固学习
5. Generation of Humorous Caption for Cartoon Images Using Deep Learning [D] . Shanmuga Sundaram, Rajesh. 2018

机译：使用深度学习的卡通形象的幽默标题
6. Deep-Pneumonia Framework Using Deep Learning Models Based on Chest X-Ray Images [O] . Nada M. Elshennawy, Dina M. Ibrahim 2020

机译：基于胸部X射线图像的深度学习模型深入肺炎框架
7. Deep Reinforcement Learning-based Image Captioning with Embedding Reward [O] . Ren, Zhou, Wang, Xiaoyu, Zhang, Ning, 2017

机译：基于深度强化学习的嵌入奖励图像标题

Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning

摘要

著录项

相似文献

相关主题

期刊订阅