首页> 外文会议>International Conference on Robotics and Automation >Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning
【24h】

Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning

机译:联合学习使用深度强化学习来构建和控制Agent

获取原文

摘要

The physical design of a robot and the policy that controls its motion are inherently coupled, and should be determined according to the task and environment. In an increasing number of applications, data-driven and learning-based approaches, such as deep reinforcement learning, have proven effective at designing control policies. For most tasks, the only way to evaluate a physical design with respect to such control policies is empirical-i.e., by picking a design and training a control policy for it. Since training these policies is time-consuming, it is computationally infeasible to train separate policies for all possible designs as a means to identify the best one. In this work, we address this limitation by introducing a method that jointly optimizes over the physical design and control network. Our approach maintains a distribution over designs and uses reinforcement learning to optimize a control policy to maximize expected reward over the design distribution. We give the controller access to design parameters to allow it to tailor its policy to each design in the distribution. Throughout training, we shift the distribution towards higher-performing designs, eventually converging to a design and control policy that are jointly optimal. We evaluate our approach in the context of legged locomotion, and demonstrate that it discovers novel designs and walking gaits, outperforming baselines across different settings.
机译:机器人的物理设计和控制其运动的策略是固有耦合的,应根据任务和环境来确定。在越来越多的应用中,事实证明,数据驱动和基于学习的方法(例如深度强化学习)可以有效地设计控制策略。对于大多数任务而言,根据此类控制策略评估物理设计的唯一方法是凭经验,即通过选择设计并为其训练控制策略。由于训练这些策略非常耗时,因此在计算上不可能针对所有可能的设计训练单独的策略来确定最佳策略。在这项工作中,我们通过引入一种在物理设计和控制网络上共同优化的方法来解决这一限制。我们的方法在设计上保持分布,并使用强化学习来优化控制策略,以最大化对设计分布的预期回报。我们为控制器提供访问设计参数的权限,以允许其针对分发中的每个设计量身定制其策略。在整个培训过程中,我们将产品分配转移到性能更高的设计上,最终融合为共同最优的设计和控制策略。我们在腿部运动的背景下评估了我们的方法,并证明它发现了新颖的设计和步行步态,在不同环境下的表现均优于基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号