首页> 外文会议>Conference on Neural Information Processing Systems >On the Correctness and Sample Complexity of Inverse Reinforcement Learning
【24h】

On the Correctness and Sample Complexity of Inverse Reinforcement Learning

机译:论反增强学习的正确性和样本复杂性

获取原文

摘要

Inverse reinforcement learning (IRL) is the problem of finding a reward function that generates a given optimal policy for a given Markov Decision Process. This paper looks at an algorithmic-independent geometric analysis of the IRL problem with finite states and actions. A L1-regularized Support Vector Machine formulation of the IRL problem motivated by the geometric analysis is then proposed with the basic objective of the inverse reinforcement problem in mind: to find a reward function that generates a specified optimal policy. The paper further analyzes the proposed formulation of inverse reinforcement learning with n states and k actions, and shows a sample complexity of O(d~2 log(nk)) for transition probability matrices with at most d nonzeros per row, for recovering a reward function that generates a policy that satisfies Bellman's optimality condition with respect to the true transition probabilities.
机译:逆钢筋学习(IRL)是找到奖励函数的问题,该奖励函数为给定的马尔可夫决策过程生成给定的最佳策略。 本文介绍了有限状态和动作的IRL问题的算法独立的几何分析。 然后提出了由几何分析激励的IRL问题的L1-正则化支持向量机配方,以逆钢筋问题的基本目标:找到生成指定最佳策略的奖励函数。 本文进一步分析了用N个州和k动作的逆钢筋学习的建议,并显示了用于每行最多的过渡概率矩阵的O(d〜2 log(nk))的样本复杂性,用于恢复奖励 生成策略的函数,以满足Bellman的最优性条件的策略,相对于真正的转换概率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号