【24h】

Policy transfer via modularity and reward guiding

机译:通过模块化和奖励引导政策转移

获取原文

摘要

Non-prehensile manipulation, such as pushing, is an important function for robots to move objects and is sometimes preferred as an alternative to grasping. However, due to unknown frictional forces, pushing has been proven a difficult task for robots. We explore the use of reinforcement learning to train a robot to robustly push an object. In order to deal with the sample complexity of training such a method, we train the pushing policy in simulation and then transfer this policy to the real world. In order to ease the transfer from simulation, we propose to use modularity to separate the learned policy from the raw inputs and outputs; rather than training "end-to-end," we decompose our system into modules and train only a subset of these modules in simulation. We further demonstrate that we can incorporate prior knowledge about the task into the state space and the reward function to speed up convergence. Finally, we introduce "reward guiding" to modify the reward function and further reduce the training time. We demonstrate, in both simulation and real-world experiments, that such an approach can be used to reliably push an object from many initial positions and orientations. Videos available at https://goo.gl/B7LtY3.
机译:非预先生操纵,例如推动,是移动物体的重要功能,并且有时优选作为抓握的替代品。然而,由于未知的摩擦力,推动已经证明了机器人的艰巨任务。我们探索使用加强学习来训练机器人强大地推动物体。为了处理这种方法的培训的样本复杂性,我们培养推动仿真政策,然后将此政策转移到现实世界。为了缓解模拟的转移,我们建议使用模块化来将学习策略与原始输入和输出分开;而不是培训“端到端”,我们将我们的系统分解为模块并仅在模拟中列出这些模块的子集。我们进一步证明我们可以将关于任务的先验知识纳入状态空间,并奖励函数来加速收敛。最后,我们介绍“奖励指导”来修改奖励功能,并进一步减少培训时间。我们在模拟和实际实验中展示,这种方法可用于可靠地从许多初始位置和方向推动物体。在https://goo.gl/b7lty3上提供的视频。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号