Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction

Yoshihiro Okawa; Tomotake Sasaki; Hidenao Iwane

首页> 外文期刊>IFAC PapersOnLine >Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction

【24h】

Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction

机译：与联合机会约束满足的安全强化学习自动勘探过程调整

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In reinforcement learning (RL) algorithms, exploratory control inputs are used during learning to acquire knowledge for decision making and control, while the true dynamics of a controlled object is unknown. However, this exploring property sometimes causes undesired situations by violating constraints regarding the state of the controlled object. In this paper, we propose an automatic exploration process adjustment method for safe RL in continuous state and action spaces utilizing a linear nominal model of the controlled object. Specifically, our proposed method automatically selects whether the exploratory input is used or not at each time depending on the state and its predicted value as well as adjusts the variance-covariance matrix used in the Gaussian policy for exploration. We also show that our exploration process adjustment method theoretically guarantees the satisfaction of the constraints with the pre-specified probability, that is, the satisfaction of a joint chance constraint at every time. Finally, we illustrate the validity and the effectiveness of our method through numerical simulation.

机译：在钢筋学习（RL）算法中，在学习期间使用探索控制输入来获取决策和控制的知识，而受控对象的真实动态是未知的。然而，这种探索性能有时违反有关受控对象状态的限制来引起不期望的情况。在本文中，我们提出了一种自动探索过程调整方法，用于利用控制对象的线性标称模型的连续状态和动作空间中的安全RL。具体而言，我们的建议方法自动选择探索输入是否在每次上使用探索输入，具体取决于状态及其预测值，以及调整高斯探索中使用的方差协方差矩阵。我们还表明，我们的探索过程调整方法理论上保证了对预先指定概率的约束的满足，即每次满足关节机会约束。最后，我们通过数值模拟说明了我们方法的有效性和有效性。

著录项

来源
《IFAC PapersOnLine》 |2020年第2期|共8页
作者
Yoshihiro Okawa; Tomotake Sasaki; Hidenao Iwane;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Reinforcement learningLearning algorithmSafe explorationSafety-criticalChance constraint;

机译：加固学习算法算法探索 - 临界约束;

相似文献

外文文献
中文文献
专利

1. Model-free reinforcement learning with model-based safe exploration: Optimizing adaptive recovery process of infrastructure systems [J] . Memarzadeh Milad, Pozzi Matteo Structural Safety . 2019,第期

机译：基于模型的安全探索的无模型强化学习：优化基础设施系统的自适应恢复过程
2. Model-free reinforcement learning with model-based safe exploration: Optimizing adaptive recovery process of infrastructure systems [J] . Memarzadeh Milad, Pozzi Matteo Structural Safety . 2019,第期

机译：基于模型的安全探索的无模型加强学习：优化基础设施系统的自适应恢复过程
3. RTP-Q: a reinforcement learning system with time constraints exploration planning for accelerating the learning rate [J] . Gang Zhao, Shoji Tatsumi, Ruoying Sun IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences . 1999,第10期

机译：RTP-Q：具有时间限制的探索性计划的强化学习系统，可加快学习速度
4. JOINT CHANCE CONSTRAINTS REDUCTION THROUGH LEARNING IN ACTIVE DISTRIBUTION NETWORKS [C] . Kyri Baker, Andrey Bernstein IEEE Global Conference on Signal and Information Processing . 2019

机译：通过在主动分配网络中学习来减少联合机会限制
5. Hierarchical reinforcement learning using automatic task decomposition and exploration shaping. [D] . Djurdjevic, Predrag. 2008

机译：使用自动任务分解和探索成形的分层强化学习。
6. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training [O] . Elien Segers, Tom Beckers, Hilde Geurts, -1

机译：工作记忆和强化时间表共同决定儿童的强化学习：行为父母培训的潜在含义
7. Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction [O] . Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane 2020

机译：与联合机会约束满足的安全强化学习自动勘探过程调整

Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction

摘要

著录项

相似文献

相关主题

期刊订阅