首页> 外文期刊>Machine Learning >Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
【24h】

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

机译:Minimax PAC使用生成模型限制了强化学习的样本复杂度

获取原文
获取原文并翻译 | 示例
           

摘要

We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with N state-action pairs and the discount factor γ∈[0,1) only O(Nlog(N/δ)/((1−γ)3 ε 2)) state-transition samples are required to find an ε-optimal estimation of the action-value function with the probability (w.p.) 1−δ. Further, we prove that, for small values of ε, an order of O(Nlog(N/δ)/((1−γ)3 ε 2)) samples is required to find an ε-optimal policy w.p. 1−δ. We also prove a matching lower bound of Θ(Nlog(N/δ)/((1−γ)3 ε 2)) on the sample complexity of estimating the optimal action-value function with ε accuracy. To the best of our knowledge, this is the first minimax result on the sample complexity of RL: the upper bounds match the lower bound in terms of N, ε, δ and 1/(1−γ) up to a constant factor. Also, both our lower bound and upper bound improve on the state-of-the-art in terms of their dependence on 1/(1−γ).
机译:我们考虑了在折扣奖励马尔可夫决策过程(MDP)中学习最佳行动值函数和最佳策略的问题。我们在存在MDP生成模型的情况下证明了两种著名的基于模型的强化学习(RL)算法在样本复杂度上的新PAC界限:值迭代和策略迭代。第一个结果表明,对于具有N个状态动作对和折扣因子γ∈[0,1)的MDP,只有O(Nlog(N /δ)/((1-γ)3ε2))个状态转换样本需要找到概率(wp)1-δ的作用值函数的ε最优估计。此外,我们证明,对于较小的ε值,需要一个O(Nlog(N /δ)/((1-γ)3ε2))个样本的阶数才能找到ε最佳策略w.p。 1-δ。我们还证明了在以ε精度估算最佳作用值函数的样本复杂度上,Θ(Nlog(N /δ)/((1-γ)3ε2))的匹配下界。据我们所知,这是RL样本复杂度的第一个极小极大结果:上限根据N,ε,δ和1 /(1-γ)匹配下限,直​​到一个恒定因子。同样,我们的下限和上限在依赖于1 /(1-γ)方面都改进了最新技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号