Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

机译：逆钢筋学习的高效概率性能范围

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting-where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the α-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navigation task and a simulated driving task and achieve tighter and more accurate bounds than a feature count-based baseline. We also give examples of how our proposed bound can be utilized to perform risk-aware policy selection and risk-aware policy improvement. Because our proposed bound requires several orders of magnitude fewer demonstrations than existing high-confidence bounds, it is the first practical method that allows agents that learn from demonstration to express confidence in the quality of their learned policy.

机译：在加强学习领域，最近有利于政策表现的安全和高信任界的进展。然而，为了我们的知识，没有存在用于确定逆钢筋学习设置中的高信任政策性能的实际方法 - 真正的奖励功能未知，只给出专家行为的样本。我们提出了一种基于贝叶斯逆强化学习的采样方法，该方法使用演示在任何评估政策和专家未知奖励功能下的预期回报之间的α最差异差异上确定实际高频率的上限。我们在标准网格导航任务和模拟驾驶任务上评估我们提出的绑定，并实现比特征计数基准的基线更紧凑，更准确的界限。我们还举例说明我们建议的绑定如何用于执行风险感知的政策选择和风险感知政策改进。因为我们的拟议绑定需要多个数量级的演示，而不是现有的高信任界限，这是允许学习示范中的代理商来表达对学习政策质量的信心的第一种实用方法。

著录项

来源
《AAAI Conference on Artificial Intelligence;Innovative Applications of Artificial Intelligence Conference;Symposium on Educational Advances in Artificial Intelligence》|2018年|2668-3603p|共9页
会议地点
作者
Daniel S. Brown; Scott Niekum;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control [J] . Sanket Kamthe, Marc Deisenroth JMLR: Workshop and Conference Proceedings . 2018,第4期

机译：具有概率模型预测控制的数据有效强化学习
2. An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning [J] . Dhruv Malik, Malayandi Palaniappan, Jaime Fisac, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：一个有效的，广义的Bellman更新，用于合作逆钢筋学习
3. Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning [J] . You Changxi, Lu Jianbo, Filev Dimitar, Robotics and Autonomous Systems . 2019,第期

机译：利用强化学习和深度逆钢筋学习的自治车辆先进规划
4. Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning [C] . Daniel S. Brown, Scott Niekum AAAI Conference on Artificial Intelligence;Innovative Applications of Artificial Intelligence Conference;Symposium on Educational Advances in Artificial Intelligence . 2018

机译：逆钢筋学习的高效概率性能范围
5. Min-Max Inverse Reinforcement Learning for Learning Bi-Modal Dialogue Policies [D] . Patil, Gandharv. 2020

机译：用于学习双模对话策略的最大最大逆钢筋学习
6. Intact Reinforcement Learning But Impaired Attentional Control During Multidimensional Probabilistic Learning in Older Adults [O] . Reka Daniel, Angela Radulescu, Yael Niv 2020

机译：更完整的强化学习但在老年人的多维概率学习期间受损
7. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning [O] . Liting Sun, Wei Zhan, Masayoshi Tomizuka 2018

机译：分层逆钢筋学习的互动驾驶行为的概率预测

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅