首页> 外文会议>International Conference on Machine Learning >Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret
【24h】

Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret

机译:安全策略寻找终身加强学习与索姆林的遗憾

获取原文

摘要

Lifelong reinforcement learning provides a promising framework for developing versatile agents that can accumulate knowledge over a lifetime of experience and rapidly learn new tasks by building upon prior knowledge. However, current lifelong learning methods exhibit non-vanishing regret as the amount of experience increases, and include limitations that can lead to suboptimal or unsafe control policies. To address these issues, we develop a lifelong policy gradient learner that operates in an adversarial setting to learn multiple tasks online while enforcing safety constraints on the learned policies. We demonstrate, for the first time, sublinear regret for lifelong policy search, and validate our algorithm on several benchmark dynamical systems and an application to quadrotor control.
机译:终身加固学习为开发多功能代理商提供了一个有前途的框架,可以在经验的一生中积累知识,并通过建立先验知识来迅速学习新的任务。 然而,随着经验量增加,当前终身学习方法表现出非消失的遗憾,并且包括可能导致次优或不安全控制政策的限制。 为了解决这些问题,我们开发了一个终身政策渐变学习者,在对抗的环境中运营,以在线在线学习多项任务,同时对学习政策执行安全限制。 我们首次展示了终身策略搜索的遗忘,并在几个基准动态系统上验证了我们的算法和向四轮车控制的应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号