Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

Mohammad Gheshlaghi Azar; Rémi Munos; Hilbert J. Kappen

首页> 外文期刊>Machine Learning >Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

【24h】

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

机译：Minimax PAC使用生成模型限制了强化学习的样本复杂度

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with N state-action pairs and the discount factor γ∈[0,1) only O(Nlog(N/δ)/((1−γ)3 ε 2)) state-transition samples are required to find an ε-optimal estimation of the action-value function with the probability (w.p.) 1−δ. Further, we prove that, for small values of ε, an order of O(Nlog(N/δ)/((1−γ)3 ε 2)) samples is required to find an ε-optimal policy w.p. 1−δ. We also prove a matching lower bound of Θ(Nlog(N/δ)/((1−γ)3 ε 2)) on the sample complexity of estimating the optimal action-value function with ε accuracy. To the best of our knowledge, this is the first minimax result on the sample complexity of RL: the upper bounds match the lower bound in terms of N, ε, δ and 1/(1−γ) up to a constant factor. Also, both our lower bound and upper bound improve on the state-of-the-art in terms of their dependence on 1/(1−γ).

机译：我们考虑了在折扣奖励马尔可夫决策过程（MDP）中学习最佳行动值函数和最佳策略的问题。我们在存在MDP生成模型的情况下证明了两种著名的基于模型的强化学习（RL）算法在样本复杂度上的新PAC界限：值迭代和策略迭代。第一个结果表明，对于具有N个状态动作对和折扣因子γ∈[0,1）的MDP，只有O（Nlog（N /δ）/（（1-γ）3ε2））个状态转换样本需要找到概率（wp）1-δ的作用值函数的ε最优估计。此外，我们证明，对于较小的ε值，需要一个O（Nlog（N /δ）/（（1-γ）3ε2））个样本的阶数才能找到ε最佳策略w.p。 1-δ。我们还证明了在以ε精度估算最佳作用值函数的样本复杂度上，Θ（Nlog（N /δ）/（（1-γ）3ε2））的匹配下界。据我们所知，这是RL样本复杂度的第一个极小极大结果：上限根据N，ε，δ和1 /（1-γ）匹配下限，直到一个恒定因子。同样，我们的下限和上限在依赖于1 /（1-γ）方面都改进了最新技术。

著录项

来源
《Machine Learning》 |2013年第3期|325-349|共25页
作者
Mohammad Gheshlaghi Azar; Rémi Munos; Hilbert J. Kappen;
展开▼
作者单位

Department of Biophysics Radboud University Nijmegen">(1);

School of Computer Science Carnegie Mellon University">(2);

INRIA Lille SequeL Project">(3);

Department of Biophysics Radboud University Nijmegen">(1);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sample complexity; Markov decision processes; Reinforcement learning; Learning theory;

机译：样本复杂度;马尔可夫决策过程;强化学习;学习理论;

相似文献

外文文献
中文文献
专利

1. Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model [J] . Mohammad Gheshlaghi Azar, Remi Munos, Hilbert J. Kappen Machine Learning . 2013,第3期

机译：Minimax PAC使用生成模型限制了强化学习的样本复杂度
2. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal [J] . Alekh Agarwal, Sham Kakade, Lin F. Yang JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：基于模型的增强学习，具有生成模型是最佳的最佳选择
3. An upper bound on the sample complexity of PAC-learning halfspaces with respect to the uniform distribution [J] . Philip M. Long Information Processing Letters . 2003,第5期

机译：PAC学习半空间样本复杂度相对于均匀分布的上限
4. On the Sample Complexity of Reinforcement Learning with a Generative Model [C] . Mohammad Gheshlaghi Azar, Remi Munos, Hilbert J. Kappen International Conference on Machine Learning . 2012

机译：论生成模型加固学习的样本复杂性
5. The Sample Complexity of Simple Reinforcement Learning [D] . Mania, Horia S. 2020

机译：简单加强学习的样本复杂性
6. Sample Complexity Bounds for Differentially Private Learning [O] . Kamalika Chaudhuri, Daniel Hsu -1

机译：差异私立学习的示例复杂性界限
7. Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model [O] . Azar Mohammad Gheshlaghi, Munos Rémi, Kappen Hilbert 2013

机译：minimax paC使用生成模型限制了强化学习的样本复杂性
8. Analysis of complexity bounds for pac-learning with random sets. [R] . E. M. Oblow V. R. R. Uppuluri 1991

机译：用随机集分析pac学习的复杂性界限。

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅