The physical design of a robot and the policy that controls its motion areinherently coupled. However, existing approaches largely ignore this coupling,instead choosing to alternate between separate design and control phases, whichrequires expert intuition throughout and risks convergence to suboptimaldesigns. In this work, we propose a method that jointly optimizes over thephysical design of a robot and the corresponding control policy in a model-freefashion, without any need for expert supervision. Given an arbitrary robotmorphology, our method maintains a distribution over the design parameters anduses reinforcement learning to train a neural network controller. Throughouttraining, we refine the robot distribution to maximize the expected reward.This results in an assignment to the robot parameters and neural network policythat are jointly optimal. We evaluate our approach in the context of leggedlocomotion, and demonstrate that it discovers novel robot designs and walkinggaits for several different morphologies, achieving performance comparable toor better than that of hand-crafted designs.
展开▼