首页> 外文会议>IEEE/ACM International Workshop on Robotics Software Engineering >A Runtime Monitoring Framework to Enforce Invariants on Reinforcement Learning Agents Exploring Complex Environments
【24h】

A Runtime Monitoring Framework to Enforce Invariants on Reinforcement Learning Agents Exploring Complex Environments

机译:一个运行时监视框架,用于在探索复杂环境的强化学习代理上强制执行不变式

获取原文

摘要

Without prior knowledge of the environment, a software agent can learn to achieve a goal using machine learning. Model-free Reinforcement Learning (RL) can be used to make the agent explore the environment and learn to achieve its goal by trial and error. Discovering effective policies to achieve the goal in a complex environment is a major challenge for RL. Furthermore, in safety-critical applications, such as robotics, an unsafe action may cause catastrophic consequences in the agent or in the environment. In this paper, we present an approach that uses runtime monitoring to prevent the reinforcement learning agent to perform 'wrong' actions and to exploit prior knowledge to smartly explore the environment. Each monitor is de?ned by a property that we want to enforce to the agent and a context. The monitors are orchestrated by a meta-monitor that activates and deactivates them dynamically according to the context in which the agent is learning. We have evaluated our approach by training the agent in randomly generated learning environments. Our results show that our approach blocks the agent from performing dangerous and safety-critical actions in all the generated environments. Besides, our approach helps the agent to achieve its goal faster by providing feedback and shaping its reward during learning.
机译:在没有环境的先验知识的情况下,软件代理可以使用机器学习来学习实现目标。无模型强化学习(RL)可用于使代理探索环境并通过反复试验来学习实现其目标。发现有效的策略以在复杂环境中实现目标是RL面临的主要挑战。此外,在诸如机器人技术之类的对安全至关重要的应用中,不安全的动作可能会在代理或环境中造成灾难性后果。在本文中,我们提出一种使用运行时监视来防止强化学习代理执行“错误”动作并利用先验知识来智能探索环境的方法。每个监视器由我们要强制执行的属性和上下文定义。监控器由元监控器编排,该元监控器根据代理正在学习的上下文来动态激活和停用它们。我们通过在随机生成的学习环境中训练代理来评估我们的方法。我们的结果表明,我们的方法可以阻止代理在所有生成的环境中执行危险和对安全至关重要的操作。此外,我们的方法通过在学习过程中提供反馈和塑造其奖励,帮助代理更快地实现其目标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号