首页> 外文期刊>ACM transactions on autonomous and adaptive systems >Probabilistic Policy Reuse for Safe Reinforcement Learning
【24h】

Probabilistic Policy Reuse for Safe Reinforcement Learning

机译:概率策略重用以进行安全强化学习

获取原文
获取原文并翻译 | 示例

摘要

This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.
机译:这项工作介绍了用于安全强化学习的策略重用,该算法结合了概率策略重用和教师建议,可以在危险和连续状态下进行安全探索,并在动态行为较为合理且空间为欧几里德的情况下进行了动作强化学习问题。该算法使用连续增加的单调风险函数,该函数可确定从给定状态最终失败的概率。根据这种状态距学习代理已知的状态空间有多远来定义这种风险函数。概率策略重用用于安全地平衡对实际学习到的知识的利用,对新动作的探索以及在状态空间中被认为是危险的部分区域中教师建议的请求。具体来说,使用pi重用探索策略。通过在直升机悬停任务和业务管理问题中进行的实验,我们证明了pi重用探索策略可用于完全避免拜访不良情况,同时保持性能(根据经典的长期累积奖励)。最终的政策实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号