Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret

机译：安全策略寻找终身加强学习与索姆林的遗憾

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Lifelong reinforcement learning provides a promising framework for developing versatile agents that can accumulate knowledge over a lifetime of experience and rapidly learn new tasks by building upon prior knowledge. However, current lifelong learning methods exhibit non-vanishing regret as the amount of experience increases, and include limitations that can lead to suboptimal or unsafe control policies. To address these issues, we develop a lifelong policy gradient learner that operates in an adversarial setting to learn multiple tasks online while enforcing safety constraints on the learned policies. We demonstrate, for the first time, sublinear regret for lifelong policy search, and validate our algorithm on several benchmark dynamical systems and an application to quadrotor control.

机译：终身加固学习为开发多功能代理商提供了一个有前途的框架，可以在经验的一生中积累知识，并通过建立先验知识来迅速学习新的任务。然而，随着经验量增加，当前终身学习方法表现出非消失的遗憾，并且包括可能导致次优或不安全控制政策的限制。为了解决这些问题，我们开发了一个终身政策渐变学习者，在对抗的环境中运营，以在线在线学习多项任务，同时对学习政策执行安全限制。我们首次展示了终身策略搜索的遗忘，并在几个基准动态系统上验证了我们的算法和向四轮车控制的应用程序。

著录项

来源
《International Conference on Machine Learning》|2016年||共17页
会议地点
作者
Haitham Bou Ammar; Rasul Tutunov; Eric Eaton;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. Zero-shot policy generation in lifelong reinforcement learning [J] . Qian Yi-Ming, Xiong Fang-Zhou, Liu Zhi-Yong Neurocomputing . 2021,第Jula25期

机译：终身加固学习中的零射精政策生成
2. Policy and Value Transfer in Lifelong Reinforcement Learning [J] . David Abel, Yuu Jinnai, Sophie Yue Guo, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：终身加固学习中的政策和价值转移
3. European Union Policies on Lifelong Learning: In-between Competitiveness Enhancement and Social Stability Reinforcement [J] . Eugenia Panitsidou, Eleni Griva, Dora Chostelidou Procedia - Social and Behavioral Sciences . 2012,第2期

机译：欧洲联盟终身学习政策：增强竞争力与增强社会稳定之间
4. Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret [C] . Haitham Bou Ammar, Rasul Tutunov, Eric Eaton International Conference on Machine Learning . 2016

机译：用Sublinear后悔寻找终身加强学习的安全政策
5. Bayesian Methods for Knowledge Transfer and Policy Search in Reinforcement Learning. [D] . Wilson, Aaron. 2012

机译：强化学习中的知识转移和策略搜索的贝叶斯方法。
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Autonomous Driving using Safe Reinforcement Learning by Incorporating a Regret-based Human Lane-Changing Decision Model [O] . Dong Chen, Longsheng Jiang, Yue Wang, 2020

机译：通过结合遗憾的人道内改变决策模型，使用安全强化学习自动驾驶

Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret

摘要

著录项

相似文献

相关主题

期刊订阅