Safe reinforcement learning in high-risk tasks through policy improvement

机译：通过改进政策，在高风险任务中进行安全强化学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement Learning (RL) methods are widely used for dynamic control tasks. In many cases, these are high risk tasks where the trial and error process may select actions which execution from unsafe states can be catastrophic. In addition, many of these tasks have continuous state and action spaces, making the learning problem harder and unapproachable with conventional RL algorithms. So, when the agent begins to interact with a risky and large state-action space environment, an important question arises: how can we avoid that the exploration of the state-action space causes damages in the learning (or other) systems. In this paper, we define the concept of risk and address the problem of safe exploration in the context of RL. Our notion of safety is concerned with states that can lead to damage. Moreover, we introduce an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment. We report experimental results using the helicopter hovering task from the RL Competition.

机译：强化学习（RL）方法广泛用于动态控制任务。在许多情况下，这些是高风险任务，其中试验和错误过程可以选择从不安全状态执行的动作可能是灾难性的。此外，许多这些任务具有连续状态和行动空间，使学习问题更难，并且与传统的RL算法更难。因此，当代理人开始与风险和大状态行动空间环境进行互动时，出现了一个重要问题：我们如何避免探索国家 - 行动空间导致学习（或其他）系统中的损害。在本文中，我们定义了风险的概念并解决了RL背景下的安全探索问题。我们的安全概念涉及可能导致损坏的国家。此外，我们介绍了一种算法，可以安全地改善次优，以实现连续状态和行动控制任务，并且从环境中收集的经验有效地学习。我们使用RL竞争中的直升机悬停任务报告实验结果。

著录项

来源
《2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning》|2011年|p.76-83|共8页
会议地点
作者
Garcia Polo Francisco Javier; Fernandez Rebollo Fernando;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning [J] . Naoto Horie, Tohgoroh Matsui, Koichi Moriyama, Artificial life and robotics . 2019,第3期

机译：多目标安全强化学习：多目标强化学习与安全强化学习之间的关系
2. Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement [J] . Ali Ezzeddine, Nafee Mourad, Babak Nadjar Araabi, Expert Systems with Application . 2018,第DECa期

机译：通过逆向强化学习和贝叶斯政策改进，结合非最佳演示和反馈中的学习
3. Exploring Feature Dimensions to Learn a New Policy in an Uninformed Reinforcement Learning Task [J] . Oh-hyeon Choung, Sang Wan Lee, Yong Jeong Scientific reports. . 2017,第1期

机译：探索功能维度以在不知情的强化学习任务中学习新策略
4. Safe reinforcement learning in high-risk tasks through policy improvement [C] . Garcia Polo Francisco Javier, Fernandez Rebollo Fernando IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning . 2011

机译：通过政策改进，在高风险任务中安全强化学习
5. Understanding Model-Based Reinforcement Learning and its Application in Safe Reinforcement Learning [D] . Hu, Dingcheng . 2019

机译：了解基于模型的强化学习及其在安全强化学习中的应用
6. Exploring Feature Dimensions to Learn a New Policy in an Uninformed Reinforcement Learning Task [O] . Oh-hyeon Choung, Sang Wan Lee, Yong Jeong -1

机译：探索功能维度以在不知情的强化学习任务中学习新策略
7. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks [O] . Richard Cheng, Gábor Orosz, Richard M. Murray, 2019

机译：通过屏障函数来学习终端到最终的安全强化，用于安全关键连续控制任务

Safe reinforcement learning in high-risk tasks through policy improvement

摘要

著录项

相似文献

相关主题

期刊订阅