首页> 外文会议>2019 International Conference on Robotics and Automation >VPE: Variational Policy Embedding for Transfer Reinforcement Learning
【24h】

VPE: Variational Policy Embedding for Transfer Reinforcement Learning

机译:VPE:用于强化转移学习的变式策略嵌入

获取原文
获取原文并翻译 | 示例

摘要

Reinforcement Learning methods are capable of solving complex problems, but resulting policies might perform poorly in environments that are even slightly different. In robotics especially, training and deployment conditions often vary and data collection is expensive, making retraining undesirable. Simulation training allows for feasible training times, but on the other hand suffer from a reality-gap when applied in real-world settings. This raises the need of efficient adaptation of policies acting in new environments.We consider the problem of transferring knowledge within a family of similar Markov decision processes. We assume that Q-functions are generated by some low-dimensional latent variable. Given such a Q-function, we can find a master policy that can adapt given different values of this latent variable. Our method learns both the generative mapping and an approximate posterior of the latent variables, enabling identification of policies for new tasks by searching only in the latent space, rather than the space of all policies. The low-dimensional space, and master policy found by our method enables policies to quickly adapt to new environments. We demonstrate the method on both a pendulum swing-up task in simulation, and for simulation-to-real transfer on a pushing task.
机译:强化学习方法能够解决复杂的问题,但是在稍微不同的环境中,最终的策略可能效果不佳。特别是在机器人技术中,培训和部署条件经常会发生变化,并且数据收集非常昂贵,因此不希望进行再培训。模拟训练允许可行的训练时间,但另一方面,在实际环境中应用时,则存在现实空白。这提出了在新环境中有效调整政策的必要性。我们考虑在一系列类似的马尔可夫决策过程中转移知识的问题。我们假设Q函数是由一些低维潜在变量生成的。给定这样的Q函数,我们可以找到一个主策略,该策略可以适应给定此潜在变量的不同值。我们的方法既学习生成映射,又学习潜在变量的近似后验,从而仅通过在潜在空间而非所有策略的空间中进行搜索就可以为新任务识别策略。我们的方法发现的低维空间和主策略使策略能够快速适应新环境。我们演示了该方法在仿真中的摆摆任务上以及在推动任务上从仿真到真实转移的过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号