A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

Konstantinos Chatzilygeroudis; Vassilis Vassiliades; Freek Stulp; Sylvain Calinon; Jean-Baptiste Mouret

首页> 外文期刊>IEEE Transactions on Robotics >A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

【24h】

A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

机译：在少数试验中学习机器人控制器的政策搜索算法调查

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word “big-data,” we refer to this challenge as “micro-data reinforcement learning.” In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.

机译：大多数政策搜索（PS）算法需要数千次训练剧集来寻找有效的政策，这通常是一种物理机器人。本调查文章侧重于频谱的极端其他末端：机器人如何只适应少数次试验（十几次）和几分钟？通过比喻与“大数据”一词，我们将此挑战称为“微数据强化学习”。在本文中，我们表明，第一个策略是利用对政策结构（例如，动态运动原语）的先验知识，在政策参数（例如，演示）或动态（例如，模拟器）上。第二种策略是创建数据驱动的预期奖励的代理模型（例如，贝叶斯优化）或动态模型（例如，基于模型的PS），以便策略优化器查询模型而不是真实系统。总的来说，所有成功的微数据算法通过改变模型和先验知识的类型来结合这两种策略。目前的科学挑战基本上围绕着扩展到复杂的机器人，设计通用前瞻以及优化计算时间。

著录项

来源
《IEEE Transactions on Robotics 》 |2020年第2期| 328-347| 共20页
作者
Konstantinos Chatzilygeroudis; Vassilis Vassiliades; Freek Stulp; Sylvain Calinon; Jean-Baptiste Mouret;
展开▼
作者单位

Inria Centre National de la Recherche Scientifique (CNRS) Université de Lorraine LORIA Nancy France;

Inria CNRS Université de Lorraine LORIA F-54000 Nancy France;

German Aerospace Center (DLR) Institute of Robotics and Mechatronics Weßling Ger;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Autonomous agents; learning and adaptive systems; micro-data policy search (MDPS); robot learning;

机译：自主代理;学习和自适应系统;微数据策略搜索（MDP）;机器人学习;

相似文献

外文文献
中文文献
专利

1. Genetic algorithms with variable search space function for fine gain tuning of model-based robotic servo controller [J] . Akimasa Otsuka, Akihiko Ikeda, Fusaoml Nagata International Journal of Mechatronics and Manufacturing Systems . 2013 ,第1期

机译：具有可变搜索空间功能的遗传算法，用于基于模型的机器人伺服控制器的微调
2. Application of variable search space genetic algorithms to fine gain tuning of model-based robotic servo controller [J] . Akimasa Otsuka, Fusaomi Nagata Artificial life and robotics . 2013 ,第1a2期

机译：变量搜索空间遗传算法在基于模型的机器人伺服控制器微调中的应用
3. Hyper-Learning Algorithms for Online Evolution of Robot Controllers [J] . Silva Fernando, Correia Luis, Christensen Anders Lyhne ACM transactions on autonomous and adaptive systems . 2017 ,第3期

机译：机器人控制器在线进化的超学习算法
4. Learning nonlinear feedback controllers from data via optimal policy search and stochastic gradient descent [C] . Laura Ferrarotti, Alberto Bemporad IEEE Conference on Decision and Control . 2020

机译：通过最佳策略搜索和随机梯度下降学习来自数据的非线性反馈控制器
5. Trajectory Optimization and Machine Learning to Design Feedback Controllers for Bipedal Robots with Provable Stability [D] . Da, Xingye. 2018

机译：轨迹优化和机器学习设计反馈控制器，用于具有可提供可证实稳定性的双模型机器人
6. Real-Time Model-Free Minimum-Seeking Autotuning Method for Unmanned Aerial Vehicle Controllers Based on Fibonacci-Search Algorithm [O] . Wojciech Giernacki, Dariusz Horla, Tomáš Báča, 2019

机译：基于斐波那契搜索算法的无人飞行器控制器实时无模型最小寻优自整定方法
7. Using policy gradient reinforcement learning on autonomous robot controllers [O] . Grudic, Gregory Z, Kumar, R. Vijay, Ungar, Lyle H 2003

机译：在自主机器人控制器上使用策略梯度强化学习

A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

摘要

著录项

相似文献

相关主题

期刊订阅