...
首页> 外文期刊>Journal of Intelligent & Robotic Systems: Theory & Application >Contextual Policy Search for Linear and Nonlinear Generalization of a Humanoid Walking Controller
【24h】

Contextual Policy Search for Linear and Nonlinear Generalization of a Humanoid Walking Controller

机译:仿人步行控制器线性和非线性泛化的上下文策略搜索

获取原文
获取原文并翻译 | 示例
           

摘要

We investigate learning of flexible robot locomotion controllers, i.e., the controllers should be applicable for multiple contexts, for example different walking speeds, various slopes of the terrain or other physical properties of the robot. In our experiments, contexts are desired walking linear speed of the gait. Current approaches for learning control parameters of biped locomotion controllers are typically only applicable for a single context. They can be used for a particular context, for example to learn a gait with highest speed, lowest energy consumption or a combination of both. The question of our research is, how can we obtain a flexible walking controller that controls the robot (near) optimally for many different contexts? We achieve the desired flexibility of the controller by applying the recently developed contextual relative entropy policy search(REPS) method which generalizes the robot walking controller for different contexts, where a context is described by a real valued vector. In this paper we also extend the contextual REPS algorithm to learn a non-linear policy instead of a linear policy over the contexts which call it RBF-REPS as it uses Radial Basis Functions. In order to validate our method, we perform three simulation experiments including a walking experiment using a simulated NAO humanoid robot. The robot learns a policy to choose the controller parameters for a continuous set of forward walking speeds.
机译:我们调查了灵活的机器人运动控制器的学习情况,即该控制器应适用于多种情况,例如不同的步行速度,地形的不同坡度或机器人的其他物理属性。在我们的实验中,需要步态的线性行走速度。用于学习Biped运动控制器的控制参数的当前方法通常仅适用于单个上下文。它们可以用于特定的环境,例如,以最快的速度,最低的能耗或两者的结合来学习步态。我们研究的问题是,如何获得一种灵活的步行控制器,以针对许多不同的情况最佳地控制机器人(近端)?我们通过应用最近开发的上下文相对熵策略搜索(REPS)方法来实现控制器的所需灵活性,该方法将机器人行走控制器推广到不同的上下文中,其中上下文由实值向量描述。在本文中,我们还扩展了上下文REPS算法,以在使用径向基函数的上下文中将其称为RBF-REPS来学习非线性策略,而不是线性策略。为了验证我们的方法,我们执行了三个模拟实验,包括使用模拟NAO人形机器人进行的步行实验。机器人学习一种策略,以选择控制器参数以实现连续的一组前进速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号