VPE: Variational Policy Embedding for Transfer Reinforcement Learning

机译：VPE：用于强化转移学习的变式策略嵌入

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement Learning methods are capable of solving complex problems, but resulting policies might perform poorly in environments that are even slightly different. In robotics especially, training and deployment conditions often vary and data collection is expensive, making retraining undesirable. Simulation training allows for feasible training times, but on the other hand suffer from a reality-gap when applied in real-world settings. This raises the need of efficient adaptation of policies acting in new environments.We consider the problem of transferring knowledge within a family of similar Markov decision processes. We assume that Q-functions are generated by some low-dimensional latent variable. Given such a Q-function, we can find a master policy that can adapt given different values of this latent variable. Our method learns both the generative mapping and an approximate posterior of the latent variables, enabling identification of policies for new tasks by searching only in the latent space, rather than the space of all policies. The low-dimensional space, and master policy found by our method enables policies to quickly adapt to new environments. We demonstrate the method on both a pendulum swing-up task in simulation, and for simulation-to-real transfer on a pushing task.

机译：强化学习方法能够解决复杂的问题，但是在稍微不同的环境中，最终的策略可能效果不佳。特别是在机器人技术中，培训和部署条件经常会发生变化，并且数据收集非常昂贵，因此不希望进行再培训。模拟训练允许可行的训练时间，但另一方面，在实际环境中应用时，则存在现实空白。这提出了在新环境中有效调整政策的必要性。我们考虑在一系列类似的马尔可夫决策过程中转移知识的问题。我们假设Q函数是由一些低维潜在变量生成的。给定这样的Q函数，我们可以找到一个主策略，该策略可以适应给定此潜在变量的不同值。我们的方法既学习生成映射，又学习潜在变量的近似后验，从而仅通过在潜在空间而非所有策略的空间中进行搜索就可以为新任务识别策略。我们的方法发现的低维空间和主策略使策略能够快速适应新环境。我们演示了该方法在仿真中的摆摆任务上以及在推动任务上从仿真到真实转移的过程。

著录项

来源
《2019 International Conference on Robotics and Automation》|2019年|36-42|共7页
会议地点 Montreal(CA)
作者
Isac Atnekvist; Danica Kragic; Johannes A. Stork;
展开▼
作者单位

Authors are with the Robotics, Perception, and Learning lab, Royal Institute of Technology, Sweden;

Authors are with the Robotics, Perception, and Learning lab, Royal Institute of Technology, Sweden;

Authors are with the Robotics, Perception, and Learning lab, Royal Institute of Technology, Sweden;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Optimization; Training; Task analysis; Robots; Adaptation models; Reinforcement learning; Supervised learning;

机译：优化;培训;任务分析;机器人;适应模型;强化学习;监督学习;;

相似文献

外文文献
中文文献
专利

1. Towards learning transferable embeddings for protein conformations using Variational Autoencoders [J] . Alexandra-Ioana Albu Procedia Computer Science . 2021,第a期

机译：朝向使用变分性自动置换的学习可转移的嵌入蛋白质构象
2. Adaptable automation with modular deep reinforcement learning and policy transfer [J] . Zohreh Raziei, Mohsen Moghaddam Engineering Applications of Artificial Intelligence . 2021,第Auga期

机译：适应性自动化，采用模块化深钢筋学习和政策转移
3. DECAF: Deep Case-based Policy Inference for knowledge transfer in Reinforcement Learning [J] . Glatt Ruben, Da Silva Felipe Leno, da Costa Bianchi Reinaldo Augusto, Expert systems with applications . 2020,第Octa期

机译：DECAF：基于深度案例的政策推论，在加固学习中的知识转移
4. VPE: Variational Policy Embedding for Transfer Reinforcement Learning [C] . Isac Atnekvist, Danica Kragic, Johannes A. Stork International Conference on Robotics and Automation . 2019

机译：VPE：嵌入转移强化学习的变分政策
5. Bayesian Methods for Knowledge Transfer and Policy Search in Reinforcement Learning. [D] . Wilson, Aaron. 2012

机译：强化学习中的知识转移和策略搜索的贝叶斯方法。
6. Learning for a Robot: Deep Reinforcement Learning Imitation Learning Transfer Learning [O] . Jiang Hua, Liangcai Zeng, Gongfa Li, 2021

机译：学习机器人：深增强学习仿制学习转移学习
7. VPE: Variational Policy Embedding for Transfer Reinforcement Learning [O] . Isac Atnekvist, Danica Kragic, Johannes A. Stork 2019

机译：VPE：嵌入转移强化学习的变分政策

VPE: Variational Policy Embedding for Transfer Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅