On the Correctness and Sample Complexity of Inverse Reinforcement Learning

机译：论反增强学习的正确性和样本复杂性

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Inverse reinforcement learning (IRL) is the problem of finding a reward function that generates a given optimal policy for a given Markov Decision Process. This paper looks at an algorithmic-independent geometric analysis of the IRL problem with finite states and actions. A L1-regularized Support Vector Machine formulation of the IRL problem motivated by the geometric analysis is then proposed with the basic objective of the inverse reinforcement problem in mind: to find a reward function that generates a specified optimal policy. The paper further analyzes the proposed formulation of inverse reinforcement learning with n states and k actions, and shows a sample complexity of O(d~2 log(nk)) for transition probability matrices with at most d nonzeros per row, for recovering a reward function that generates a policy that satisfies Bellman's optimality condition with respect to the true transition probabilities.

机译：逆钢筋学习（IRL）是找到奖励函数的问题，该奖励函数为给定的马尔可夫决策过程生成给定的最佳策略。本文介绍了有限状态和动作的IRL问题的算法独立的几何分析。然后提出了由几何分析激励的IRL问题的L1-正则化支持向量机配方，以逆钢筋问题的基本目标：找到生成指定最佳策略的奖励函数。本文进一步分析了用N个州和k动作的逆钢筋学习的建议，并显示了用于每行最多的过渡概率矩阵的O（d〜2 log（nk））的样本复杂性，用于恢复奖励生成策略的函数，以满足Bellman的最优性条件的策略，相对于真正的转换概率。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p6363-7159|共10页
会议地点
作者
Abi Komanduru; Jean Honorio;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity [J] . Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, The Journal of Artificial Intelligence Research . 2018,第8期

机译：渐近时间差异学习：具有多项式样本复杂度的稳定强化学习
2. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity [J] . Liu Bo, Gemp Ian, Ghavamzadeh Mohammad, The Journal of Artificial Intelligence Research . 2018,第期

机译：近端梯度时间差异学习：具有多项式样本复杂性的稳定增强学习
3. Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model [J] . Mohammad Gheshlaghi Azar, Rémi Munos, Hilbert J. Kappen Machine Learning . 2013,第3期

机译：Minimax PAC使用生成模型限制了强化学习的样本复杂度
4. On the Correctness and Sample Complexity of Inverse Reinforcement Learning [C] . Abi Komanduru, Jean Honorio Conference on Neural Information Processing Systems . 2020

机译：论反增强学习的正确性和样本复杂性
5. The Sample Complexity of Simple Reinforcement Learning [D] . Mania, Horia S. 2020

机译：简单加强学习的样本复杂性
6. Machine Teaching for Human Inverse Reinforcement Learning [O] . Michael S. Lee, Henny Admoni, Reid Simmons 2021

机译：人类逆钢筋学习机器教学
7. Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model [O] . Azar Mohammad Gheshlaghi, Munos Rémi, Kappen Hilbert 2013

机译：minimax paC使用生成模型限制了强化学习的样本复杂性
8. Complexity Analysis of Real-Time Reinforcement Learning Applied to FindingShortest Paths in Deterministic Domains [R] . Koenig, S., Simmons, R. G. 1992

机译：实时强化学习的复杂性分析应用于确定性域中寻找最短路径

On the Correctness and Sample Complexity of Inverse Reinforcement Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅