A Runtime Monitoring Framework to Enforce Invariants on Reinforcement Learning Agents Exploring Complex Environments

机译：一个运行时监视框架，用于在探索复杂环境的强化学习代理上强制执行不变式

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Without prior knowledge of the environment, a software agent can learn to achieve a goal using machine learning. Model-free Reinforcement Learning (RL) can be used to make the agent explore the environment and learn to achieve its goal by trial and error. Discovering effective policies to achieve the goal in a complex environment is a major challenge for RL. Furthermore, in safety-critical applications, such as robotics, an unsafe action may cause catastrophic consequences in the agent or in the environment. In this paper, we present an approach that uses runtime monitoring to prevent the reinforcement learning agent to perform 'wrong' actions and to exploit prior knowledge to smartly explore the environment. Each monitor is de?ned by a property that we want to enforce to the agent and a context. The monitors are orchestrated by a meta-monitor that activates and deactivates them dynamically according to the context in which the agent is learning. We have evaluated our approach by training the agent in randomly generated learning environments. Our results show that our approach blocks the agent from performing dangerous and safety-critical actions in all the generated environments. Besides, our approach helps the agent to achieve its goal faster by providing feedback and shaping its reward during learning.

机译：在没有环境的先验知识的情况下，软件代理可以使用机器学习来学习实现目标。无模型强化学习（RL）可用于使代理探索环境并通过反复试验来学习实现其目标。发现有效的策略以在复杂环境中实现目标是RL面临的主要挑战。此外，在诸如机器人技术之类的对安全至关重要的应用中，不安全的动作可能会在代理或环境中造成灾难性后果。在本文中，我们提出一种使用运行时监视来防止强化学习代理执行“错误”动作并利用先验知识来智能探索环境的方法。每个监视器由我们要强制执行的属性和上下文定义。监控器由元监控器编排，该元监控器根据代理正在学习的上下文来动态激活和停用它们。我们通过在随机生成的学习环境中训练代理来评估我们的方法。我们的结果表明，我们的方法可以阻止代理在所有生成的环境中执行危险和对安全至关重要的操作。此外，我们的方法通过在学习过程中提供反馈和塑造其奖励，帮助代理更快地实现其目标。

著录项

来源
《IEEE/ACM International Workshop on Robotics Software Engineering》|2019年|5-12|共8页
会议地点 Montreal(CA)
作者
Piergiuseppe Mallozzi; Ezequiel Castellano; Patrizio Pelliccione; Gerardo Schneider; Kenji Tei;
展开▼
作者单位

Chalmers-University of Gothenburg;

National Institute of Informatics Graduate University for Advanced Studies;

Chalmers-University of Gothenburg Università degli Studi dell’Aquila;

Waseda University;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Monitoring; Runtime; Safety; Reinforcement learning; Probabilistic logic; Software agents;

机译：监控；运行;安全;强化学习；概率逻辑；软件代理;

相似文献

外文文献
中文文献
专利

1. Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent’s learning process in multiagent environments [J] . H. S. Al-Dayaa, D. B. Megherbi The Journal of Supercomputing . 2012,第1期

机译：使用座席状态发生频率并分析多座席环境中座席学习过程中的知识共享的强化学习技术
2. Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent's learning process in multiagent environments [J] . H.S. Al-Dayaa, D.B. Megherbi Journal of supercomputing . 2012,第1期

机译：使用代理状态发生频率并分析多代理环境中代理学习过程中的知识共享的强化学习技术
3. A unified framework for reinforcement learning, co-learning and meta-learning how to coordinate in collaborative multi-agent systems [J] . Predrag T. To?i?, Ricardo Vilalta Procedia Computer Science . 2010,第1期

机译：强化学习，共同学习和元学习的统一框架，如何在协作式多智能体系统中进行协调
4. A Runtime Monitoring Framework to Enforce Invariants on Reinforcement Learning Agents Exploring Complex Environments [C] . Piergiuseppe Mallozzi, Ezequiel Castellano, Patrizio Pelliccione, IEEE/ACM International Workshop on Robotics Software Engineering . 2019

机译：运行时监控框架，以强制探索复杂环境的强化学习代理的不变性
5. A Coordinated Reinforcement Learning Framework for Multi-Agent Virtual Environments. [D] . Sause, William J. 2013

机译：多代理虚拟环境的协作强化学习框架。
6. FAIL Is Not a Four-Letter Word: A Theoretical Framework for Exploring Undergraduate Students’ Approaches to Academic Challenge and Responses to Failure in STEM Learning Environments [O] . Meredith A. Henry, Shayla Shorter, Louise Charkoudian, 2019

机译：失败不是四个字母的单词：探索大学生在STEM学习环境中应对学业挑战和对失败的应对方法的理论框架
7. Modular Reinforcement Learning Architectures for Artificially Intelligent Agents in Complex Game Environments [O] . Christopher J. Hanna, Raymond J. Hickey, Darryl K. Charles, 2010

机译：复杂游戏环境中人工智能主体的模块化强化学习架构

A Runtime Monitoring Framework to Enforce Invariants on Reinforcement Learning Agents Exploring Complex Environments

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅