...
首页> 外文期刊>IFAC PapersOnLine >Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction
【24h】

Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction

机译:与联合机会约束满足的安全强化学习自动勘探过程调整

获取原文
           

摘要

In reinforcement learning (RL) algorithms, exploratory control inputs are used during learning to acquire knowledge for decision making and control, while the true dynamics of a controlled object is unknown. However, this exploring property sometimes causes undesired situations by violating constraints regarding the state of the controlled object. In this paper, we propose an automatic exploration process adjustment method for safe RL in continuous state and action spaces utilizing a linear nominal model of the controlled object. Specifically, our proposed method automatically selects whether the exploratory input is used or not at each time depending on the state and its predicted value as well as adjusts the variance-covariance matrix used in the Gaussian policy for exploration. We also show that our exploration process adjustment method theoretically guarantees the satisfaction of the constraints with the pre-specified probability, that is, the satisfaction of a joint chance constraint at every time. Finally, we illustrate the validity and the effectiveness of our method through numerical simulation.
机译:在钢筋学习(RL)算法中,在学习期间使用探索控制输入来获取决策和控制的知识,而受控对象的真实动态是未知的。然而,这种探索性能有时违反有关受控对象状态的限制来引起不期望的情况。在本文中,我们提出了一种自动探索过程调整方法,用于利用控制对象的线性标称模型的连续状态和动作空间中的安全RL。具体而言,我们的建议方法自动选择探索输入是否在每次上使用探索输入,具体取决于状态及其预测值,以及调整高斯探索中使用的方差协方差矩阵。我们还表明,我们的探索过程调整方法理论上保证了对预先指定概率的约束的满足,即每次满足关节机会约束。最后,我们通过数值模拟说明了我们方法的有效性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号