首页> 外文会议>2019 International Conference on Robotics and Automation >BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning
【24h】

BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning

机译:BaRC:机器人强化学习的后向可达性课程

获取原文
获取原文并翻译 | 示例

摘要

Model-free Reinforcement Learning (RL) offers an attractive approach to learn control policies for high dimensional systems, but its relatively poor sample complexity often necessitates training in simulated environments. Even in simulation, goal-directed tasks whose natural reward function is sparse remain intractable for state-of-the-art model-free algorithms for continuous control. The bottleneck in these tasks is the prohibitive amount of exploration required to obtain a learning signal from the initial state of the system. In this work, we leverage physical priors in the form of an approximate system dynamics model to design a curriculum for a model-free policy optimization algorithm. Our Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance. BaRC is general, in that it can accelerate training of any model-free RL algorithm on a broad class of goal-directed continuous control MDPs. Its curriculum strategy is physically intuitive, easy-to-tune, and allows incorporating physical priors to accelerate training without hindering the performance, flexibility, and applicability of the model-free RL algorithm. We evaluate our approach on two representative dynamic robotic learning problems and find substantial performance improvement relative to previous curriculum generation techniques and naive exploration strategies.
机译:无模型强化学习(RL)提供了一种吸引人的方法来学习高维系统的控制策略,但是其相对较差的样本复杂性通常需要在模拟环境中进行训练。即使在模拟中,对于自然控制功能最先进的无模型算法,其自然奖励功能稀疏的目标任务仍然难以解决。这些任务中的瓶颈是无法从系统的初始状态获得学习信号所需的大量探索。在这项工作中,我们以近似的系统动力学模型的形式利用物理先验知识来设计无模型策略优化算法的课程。我们的后向可到达性课程(BaRC)从需要少量动作才能完成任务的州开始策略培训,并在策略优化算法证明足够的性能后以动态一致的方式向后扩展初始状态分布。 BaRC是通用的,因为它可以在广泛的目标定向连续控制MDP上加速任何无模型RL算法的训练。它的课程策略在物理上直观,易于调整,并允许结合物理先验知识以加快训练速度,而不会影响无模型RL算法的性能,灵活性和适用性。我们评估了两个具有代表性的动态机器人学习问题的方法,并发现相对于先前的课程生成技术和幼稚的探索策略而言,性能有了显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号