Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

Andre Barreto; Diana Borsa; John Quan; Tom Schaul; David Silver; Matteo Hessel; Daniel Mankowitz; Augustin Zidek; Remi Munos

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

【24h】

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

机译：使用后续功能和通用策略改进进行深度强化学习中的转换

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SF&GPI framework in two ways. One of the basic assumptions underlying the original formulation of SF&GPI is that rewards for all tasks of interest can be computed as linear combinations of a fixed set of features. We relax this constraint and show that the theoretical guarantees supporting the framework can be extended to any set of tasks that only differ in the reward function. Our second contribution is to show that one can use the reward functions themselves as features for future tasks, without any loss of expressiveness, thus removing the need to specify a set of features beforehand. This makes it possible to combine SF&GPI with deep learning in a more stable way. We empirically verify this claim on a complex 3D environment where observations are images from a first-person perspective. We show that the transfer promoted by SF&GPI leads to very good policies on unseen tasks almost instantaneously. We also describe how to learn policies specialised to the new tasks in a way that allows them to be added to the agent’s set of skills, and thus be reused in the future.

机译：跨任务传递技能的能力有可能将强化学习（RL）代理扩展到当前无法到达的环境。最近，已经引入了基于两种思想的框架，即继任特征（SF）和广义策略改进（GPI），作为原则上的技能转移方式。在本文中，我们以两种方式扩展了SF＆GPI框架。 SF＆GPI原始公式的基本假设之一是，可以将所有感兴趣的任务的奖励计算为一组固定功能的线性组合。我们放宽了此约束，并表明支持该框架的理论保证可以扩展到仅奖励功能不同的任何任务集。我们的第二个贡献是表明，可以将奖励功能本身用作将来任务的功能，而又不会损失任何表达能力，从而无需事先指定一组功能。这使得以更稳定的方式将SF＆GPI与深度学习结合在一起成为可能。我们在复杂的3D环境中（从第一人称视角观察图像）经验地验证了这一说法。我们表明，SF＆GPI促进的转移几乎在瞬间就导致了针对未见任务的非常好的政策。我们还将介绍如何学习专门针对新任务的策略，以使其能够被添加到座席的技能中，从而在将来被重用。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共10页
作者
Andre Barreto; Diana Borsa; John Quan; Tom Schaul; David Silver; Matteo Hessel; Daniel Mankowitz; Augustin Zidek; Remi Munos;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
入库时间 2022-08-18 15:56:26

相似文献

外文文献
中文文献
专利

1. Adaptable automation with modular deep reinforcement learning and policy transfer [J] . Zohreh Raziei, Mohsen Moghaddam Engineering Applications of Artificial Intelligence . 2021,第Auga期

机译：适应性自动化，采用模块化深钢筋学习和政策转移
2. DECAF: Deep Case-based Policy Inference for knowledge transfer in Reinforcement Learning [J] . Glatt Ruben, Da Silva Felipe Leno, da Costa Bianchi Reinaldo Augusto, Expert systems with applications . 2020,第Octa期

机译：DECAF：基于深度案例的政策推论，在加固学习中的知识转移
3. Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning [J] . Lucas Lehnert, Michael L. Littman Journal of machine learning research . 2020,第a期

机译：继任者特点结合了无模型和基于模型的强化学习的元素
4. Deep reinforcement learning with successor features for navigation across similar environments [C] . Jingwei Zhang, Jost Tobias Springenberg, Joschka Boedecker, IEEE/RSJ International Conference on Intelligent Robots and Systems . 2017

机译：具有后续功能的深度强化学习，可在类似环境中导航
5. On Deep Reinforcement Learning for Games: Generalization of Deep Q-Learning with Multiple Policy Heads [D] . Boucher, Mathieu. 2020

机译：关于游戏的深度加固学习：多重政策头部深度Q学的泛化
6. Learning for a Robot: Deep Reinforcement Learning Imitation Learning Transfer Learning [O] . Jiang Hua, Liangcai Zeng, Gongfa Li, 2021

机译：学习机器人：深增强学习仿制学习转移学习
7. Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments [O] . Zhang, Jingwei, Springenberg, Jost Tobias, Boedecker, Joschka, 2017

机译：深度强化学习与后续导航功能跨类似的环境

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

摘要

著录项

相似文献

相关主题

期刊订阅