首页> 外文会议>American Control Conference >Safe Reinforcement Learning: Learning with Supervision Using a Constraint-Admissible Set
【24h】

Safe Reinforcement Learning: Learning with Supervision Using a Constraint-Admissible Set

机译:安全强化学习:使用约束可接受的套装学习监督

获取原文

摘要

Despite recent advances in Reinforcement Learning (RL), its applications in real-world engineering systems are still rare. The primary reason is that RL algorithms involve exploratory actions that can lead to system constraint violations. These violations can damage physical systems and even cause safety issues, e.g., battery overheat, robot breakdown, and car crashes, hindering RL deployment in many engineering applications. In this paper, we develop a novel safe RL framework that guarantees safety during learning by exploiting a constraint-admissible set for supervision. System knowledge and recursive feasibility techniques are exploited to construct a state-dependent constraint-admissible set. We develop a new learning scheme where the constraint-admissible set regulates the exploratory actions from the RL agent and simultaneously guides the agent to learn the system constraints with a penalty for control regulation. The proposed safe RL algorithm is demonstrated in an adaptive cruise control example where a nonlinear fuel economy cost function is optimized without violating system constraints. We demonstrate that the safe RL agent is able to learn the system constraints to gradually fade out the control supervisor.
机译:尽管近期加固学习(RL)进展,但其在现实世界工程系统中的应用仍然很少见。主要原因是RL算法涉及可能导致系统约束违规的探索性操作。这些违规行为可能会损害物理系统,甚至引起安全问题,例如,电池过热,机器人故障和汽车崩溃,在许多工程应用中阻碍了RL部署。在本文中,我们开发了一种新颖的安全RL框架,通过利用限制性的监督设定来保证学习期间的安全。利用系统知识和递归可行性技术来构建一个国家相关的约束允许集。我们开发了一个新的学习计划,其中约束允许设置从RL代理规范探索性行动,并同时指导代理人以学习对控制调节的罚款的系统限制。所提出的安全RL算法在自适应巡航控制示例中进行了说明,其中非线性燃料经济性成本函数经过优化而不违反系统约束。我们证明了安全的RL代理能够学习系统限制,以逐渐淡出控制主管。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号