首页> 外文会议>International Symposium on Robotics Research >AdaPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems
【24h】

AdaPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems

机译:适应:随机动力系统的零射自适应策略转移

获取原文

摘要

Deep reinforcement learning (RL) has achieved remarkable advances in sequential decision making in recent years, often outperforming humans on tasks such as Atari games. However, model-free variants of deep RL are not directly applicable to physical systems because they exhibit poor sample complexity, often requiring millions of training examples on an accurate model of the environment. One approach to using model-free RL methods on robotic systems is thus to train in a relatively accurate simulator (a source domain), and transfer the policy to the physical robot (a target domain). This naive transfer may, in practice, perform arbitrarily badly and so online fine-tuning may be performed. During this fine-tuning, the robot may behave unsafely however, and so it is desirable for a system to be able to train in a simulator with slight model inaccuracies but still be able to perform well on the target system on the first iteration. We refer to this as the zero-shot policy transfer problem.
机译:深度加强学习(RL)近年来逐渐决策的显着进展,往往优于Atari Games等任务的人类。然而,Deep RL的无模型变体不可用于物理系统,因为它们具有较差的样本复杂性,通常需要数百万训练示例的环境准确的环境。因此,在机器人系统上使用无模型RL方法的方法是在相对准确的模拟器(源域)中训练,并将策略转移到物理机器人(目标域)。在实践中,这种天真的传输可以在实践中进行任意糟糕并且可以执行在线微调。在这种微调期间,机器人可以表现不必要,因此希望一种系统能够在模拟器中训练具有轻微的模型不准确,但仍然能够在第一迭代上对目标系统执行良好。我们将此称为零击策略转移问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号