...
首页> 外文期刊>IEEE Transactions on Robotics >A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials
【24h】

A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

机译:在少数试验中学习机器人控制器的政策搜索算法调查

获取原文
获取原文并翻译 | 示例

摘要

Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word “big-data,” we refer to this challenge as “micro-data reinforcement learning.” In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.
机译:大多数政策搜索(PS)算法需要数千次训练剧集来寻找有效的政策,这通常是一种物理机器人。本调查文章侧重于频谱的极端其他末端:机器人如何只适应少数次试验(十几次)和几分钟?通过比喻与“大数据”一词,我们将此挑战称为“微数据强化学习”。在本文中,我们表明,第一个策略是利用对政策结构(例如,动态运动原语)的先验知识,在政策参数(例如,演示)或动态(例如,模拟器)上。第二种策略是创建数据驱动的预期奖励的代理模型(例如,贝叶斯优化)或动态模型(例如,基于模型的PS),以便策略优化器查询模型而不是真实系统。总的来说,所有成功的微数据算法通过改变模型和先验知识的类型来结合这两种策略。目前的科学挑战基本上围绕着扩展到复杂的机器人,设计通用前瞻以及优化计算时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号