首页> 外文期刊>IFAC PapersOnLine >Multi-step Greedy Reinforcement Learning Based on Model Predictive Control
【24h】

Multi-step Greedy Reinforcement Learning Based on Model Predictive Control

机译:基于模型预测控制的多步贪婪钢筋学习

获取原文
           

摘要

Reinforcement learning aims to compute optimal control policies with the help of data from closed-loop trajectories. Traditional model-free approaches need huge number of data points to achieve an acceptable performance, rendering them not applicable in most real situations, even if the data can be obtained from a detailed simulator. Model-based reinforcement learning approaches try to leverage model knowledge to drastically reduce the amount of data needed or to enforce important constraints to the closed-loop operation, which is another important drawback of model-free approaches. This paper proposes a novel model-based reinforcement learning approach. The main novelty is the fact that we exploit all the information of a model predictive control (MPC) computing step, and not only the first input that is actually applied to the plant, to efficiently learn a good approximation of the state value function. This approximation can be included into a model predictive control formulation as a terminal cost with a short prediction horizon, achieving a similar performance to an MPC with a very long prediction horizon. Simulation results of a discretized batch bioreactor illustrate the potential of the proposed methodology.
机译:强化学习旨在在闭环轨迹的数据的帮助下计算最佳控制策略。传统的无模式方法需要大量的数据点来实现可接受的性能,即使数据可以从详细的模拟器获得数据,也可以在最实际情况下呈现不适用。基于模型的强化学习方法尝试利用模型知识来大大减少所需的数据量或强制对闭环操作来强制执行重要的限制,这是无模型方法的另一个重要缺点。本文提出了一种基于模型的钢筋学习方法。主要的新颖性是我们利用模型预测控制(MPC)计算步骤的所有信息,并且不仅实际应用于工厂的第一输入,以有效地学习状态值函数的良好近似。该近似可以包括在模型预测控制制剂中作为具有短预测地平线的终端成本,以具有非常长的预测地平线的MPC实现类似的性能。离散批量生物反应器的仿真结果说明了所提出的方法的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号