...
首页> 外文期刊>Robotics and Autonomous Systems >Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning
【24h】

Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning

机译:通过基于EM的强化学习进行合规技能获取和多最优策略搜索

获取原文
获取原文并翻译 | 示例
           

摘要

The democratization of robotics technology and the development of new actuators progressively bring robots closer to humans. The applications that can now be envisaged drastically contrast with the requirements of industrial robots. In standard manufacturing settings, the criterions used to assess performance are usually related to the robot's accuracy, repeatability, speed or stiffness. Learning a control policy to actuate such robots is characterized by the search of a single solution for the task, with a representation of the policy consisting of moving the robot through a set of points to follow a trajectory. With new environments such as homes and offices populated with humans, the reproduction performance is portrayed differently. These robots are expected to acquire rich motor skills that can be generalized to new situations, while behaving safely in the vicinity of users. Skills acquisition can no longer be guided by a single form of learning, and must instead combine different approaches to continuously create, adapt and refine policies. The family of search strategies based on expectation-maximization (EM) looks particularly promising to cope with these new requirements. The exploration can be performed directly in the policy parameters space, by refining the policy together with exploration parameters represented in the form of covariances. With this formulation, RL can be extended to a multi-optima search problem in which several policy alternatives can be considered. We present here two applications exploiting EM-based exploration strategies, by considering parameterized policies based on dynamical systems, and by using Gaussian mixture models for the search of multiple policy alternatives.
机译:机器人技术的民主化和新执行器的发展逐渐使机器人更接近人类。现在可以设想的应用与工业机器人的需求形成了鲜明的对比。在标准制造设置中,用于评估性能的标准通常与机器人的精度,可重复性,速度或刚度有关。学习用于致动此类机器人的控制策略的特征在于,针对任务的单个解决方案的搜索,该策略的表示包括将机器人移动经过一组点以遵循轨迹。在新的环境中,例如在人满为患的家庭和办公室中,再现性能的描述也有所不同。这些机器人有望获得丰富的运动技能,这些技能可以推广到新的情况,同时在用户附近安全运行。技能获取不再只能以单一形式的学习为指导,而必须结合不同的方法来不断创建,调整和完善政策。基于期望最大化(EM)的搜索策略系列看起来特别有希望满足这些新要求。通过将策略与以协方差形式表示的探索参数一起完善,可以直接在政策参数空间中执行探索。通过这种表述,可以将RL扩展到一个多最优搜索问题,在该问题中可以考虑多种策略选择。我们在这里介绍两个利用基于EM的探索策略的应用程序,这些应用程序通过考虑基于动态系统的参数化策略,以及通过使用高斯混合模型来搜索多个策略替代方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号