首页> 外文期刊>Robotics and Autonomous Systems >End-to-end nonprehensile rearrangement with deep reinforcement learning and simulation-to-reality transfer
【24h】

End-to-end nonprehensile rearrangement with deep reinforcement learning and simulation-to-reality transfer

机译:具有深度增强学习和模拟对现实转移的端到端的抗生素重排

获取原文
获取原文并翻译 | 示例
           

摘要

Nonprehensile rearrangement is the problem of controlling a robot to interact with objects through pushing actions in order to reconfigure the objects into a predefined goal pose. In this work, we rearrange one object at a time in an environment with obstacles using an end-to-end policy that maps raw pixels as visual input to control actions without any form of engineered feature extraction. To reduce the amount of training data that needs to be collected using a real robot, we propose a simulation-to-reality transfer approach. In the first step, we model the nonprehensile rearrangement task in simulation and use deep reinforcement learning to learn a suitable rearrangement policy, which requires in the order of hundreds of thousands of example actions for training. Thereafter, we collect a small dataset of only 70 episodes of real-world actions as supervised examples for adapting the learned rearrangement policy to real-world input data. In this process, we make use of newly proposed strategies for improving the reinforcement learning process, such as heuristic exploration and the curation of a balanced set of experiences. We evaluate our method in both simulation and real setting using a Baxter robot to show that the proposed approach can effectively improve the training process in simulation, as well as efficiently adapt the learned policy to the real world application, even when the camera pose is different from simulation. Additionally, we show that the learned system not only can provide adaptive behavior to handle unforeseen events during executions, such as distraction objects, sudden changes in positions of the objects, and obstacles, but also can deal with obstacle shapes that were not present in the training process. (C) 2019 Elsevier B.V. All rights reserved.
机译:非渴望重排是通过推动动作来控制机器人与对象相互作用的问题,以便将对象重新配置到预定义的目标姿势中。在这项工作中,我们在一个环境中重新排列一个对象,其中使用端到端策略将RAW像素映射为可视输入以控制动作而没有任何形式的工程特征提取。为了减少使用真正的机器人收集需要收集的培训数据量,我们提出了一种模拟到现实的转移方法。在第一步中,我们模拟了模拟中的非瓣膜排雷任务,并使用深度加强学习学习合适的重排策略,这需要数十万示例训练的示例动作。此后,我们收集一个只有70张实际行动剧集的小型数据集,如监督示例,用于调整学习的重新安排策略到现实世界的输入数据。在这一过程中,我们利用新拟议的策略来改善加强学习过程,例如启发式探索和均衡经验的策划。我们使用Baxter Robot评估模拟和实际设置的方法,以表明所提出的方法可以有效地改善模拟中的培训过程,以及将学习政策有效地调整到现实世界应用程序,即使相机姿势不同从模拟。此外,我们表明,学习系统不仅可以提供在执行期间处理不可预见的事件的自适应行为,例如分散对象,对象位置的突然变化以及障碍物,而且还可以处理不存在的障碍物形状培训过程。 (c)2019年Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号