Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

van Rooijen J. C.; Grondman I.; Babuska R.

首页> 外文期刊>Mechatronics: The Science of Intelligent Machines >Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

【24h】

Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

机译：使用基于价值梯度的策略进行实时运动控制的无学习率强化学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning (RL) is a framework that enables a controller to find an optimal control policy for a task in an unknown environment. Although RL has been successfully used to solve optimal control problems, learning is generally slow. The main causes are the inefficient use of information collected during interaction with the system and the inability to use prior knowledge on the system or the control task. In addition, the learning speed heavily depends on the learning rate parameter, which is difficult to tune. In this paper, we present a sample-efficient, learning-rate-free version of the Value-Gradient Based Policy (VGBP) algorithm. The main difference between VGBP and other frequently used algorithms, such as Sarsa, is that in VGBP the learning agent has a direct access to the reward function, rather than just the immediate reward values. Furthermore, the agent learns a process model. This enables the algorithm to select control actions by optimizing over the right-hand side of the Bellman equation. We demonstrate the fast learning convergence in simulations and experiments with the underactuated pendulum swing-up task. In addition, we present experimental results for a more complex 2-DOF robotic manipulator. (C) 2014 Elsevier Ltd. All rights reserved.

机译：强化学习（RL）是一个框架，使控制器能够为未知环境中的任务找到最佳的控制策略。尽管RL已成功用于解决最佳控制问题，但学习通常较慢。主要原因是与系统交互过程中收集的信息使用效率低下，以及无法使用有关系统或控制任务的先验知识。另外，学习速度在很大程度上取决于学习速率参数，该参数很难调整。在本文中，我们提出了基于值梯度的策略（VGBP）算法的高效样本，无学习率版本。 VGBP与其他常用算法（例如Sarsa）之间的主要区别在于，在VGBP中，学习代理可以直接访问奖励功能，而不仅仅是直接的奖励值。此外，代理学习过程模型。这使算法能够通过对Bellman方程的右侧进行优化来选择控制动作。我们在模拟和实验中证明了快速学习收敛与欠驱动摆摆动任务。此外，我们提出了一种更复杂的2-DOF机器人操纵器的实验结果。（C）2014 Elsevier Ltd.保留所有权利。

著录项

来源
《Mechatronics: The Science of Intelligent Machines》 |2014年第8期|共9页
作者
van Rooijen J. C.; Grondman I.; Babuska R.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类一般性问题;
关键词
Reinforcement learning; Process model; Robotics; Local linear regression; Least squares temporal difference;

机译：强化学习;过程模型;机器人技术;局部线性回归;最小二乘时差;
入库时间 2022-08-18 11:46:25

相似文献

外文文献
中文文献
专利

1. Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy [J] . van Rooijen J. C., Grondman I., Babuska R. Mechatronics: The Science of Intelligent Machines . 2014,第8期

机译：使用基于价值梯度的策略进行实时运动控制的无学习率强化学习
2. The "Proactive" Model of Learning: Integrative Framework for Model-Free and Model-Based Reinforcement Learning Utilizing the Associative Learning-Based Proactive Brain Concept [J] . Zsuga Judit, Biro Klara, Papp Csaba, Behavioral neuroscience . 2016,第1期

机译：“主动”学习模型：利用基于联合学习的主动脑概念进行无模型和基于模型的强化学习的集成框架
3. Reinforcement Learning-Based Variable Speed Limit Control Strategy to Reduce Traffic Congestion at Freeway Recurrent Bottlenecks [J] . Zhibin Li, Pan Liu, Chengcheng Xu, Intelligent Transportation Systems, IEEE Transactions on . 2017,第11期

机译：基于强化学习的变速限制控制策略，可减少高速公路经常性瓶颈时的交通拥堵
4. Comparison study of two reinforcement learning based real-time control policies for two-machine-one-buffer production system [C] . Wei Zheng, Yong Lei, Qing Chang IEEE International Conference on Automation Science and Engineering . 2017

机译：两机一缓冲生产系统中两种基于强化学习的实时控制策略的比较研究
5. Dynamic tuning of PI-controllers based on model-free Reinforcement Learning methods. [D] . Abbasi Brujeni, Lena. 2010

机译：基于无模型强化学习方法的PI控制器的动态调整。
6. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation [O] . Zehui Kong, Yuan Zou, Teng Liu 2011

机译：基于强化学习的混合动力电动汽车实时能源管理策略的实现与仿真验证
7. RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control [O] . Todd Hester, Michael Quinlan, Peter Stone 2012

机译：RTMBA：基于实时模型的机器人控制强化学习架构

Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

摘要

著录项

相似文献

相关主题

期刊订阅