PAC Bounds for Simulation-based optimization of Markov Decision Processes

机译：基于模拟的马尔可夫决策过程的PAC界限

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We generalize the PAC Learning framework for Markov Decision Processes developed in [18]. We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. We then propose a framework to obtain an ε-optimal policy from simulation. We provide sample complexity of such an approach.

机译：我们概括了[18]中开发的马尔可夫决策过程的PAC学习框架。我们考虑奖励功能，以依赖国家和行动。状态和行动空间都可能是无穷无尽的。我们获得了Markov决策过程的价值函数的估算，该过程分配给每个政策其预期的折扣奖励。可以估计这一预期奖励作为许多独立模拟运行的奖励的实证平均值。在策略类的V-C或伪维度方面，我们派生了对经验平均值的经验平均值收敛到预期奖励所需的界限。然后，我们提出了一个框架来获得仿真的ε-最佳策略。我们提供这种方法的样本复杂性。

著录项

来源
《IEEE Conference on Decision and Control》|2007年||共6页
会议地点
作者
Rahul Jain; Pravin P. Varaiya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP273-53;
关键词

相似文献

外文文献
中文文献
专利

1. Simulation-based optimization of Markov decision processes: An empirical process theory approach [J] . Rahul Jain, Pravin Varaiya Automatica . 2010,第8期

机译：基于模拟的马尔可夫决策过程优化：一种经验过程理论方法
2. Simulation-based Optimization Algorithms For Finite-horizon Markov Decision Processes [J] . Shalabh Bhatnaqar, Mohammed Shahid Abdulla Simulation . 2008,第12期

机译：有限水平马尔可夫决策过程的基于仿真的优化算法
3. Error bounds of optimization algorithms for semi-Markov decision processes [J] . TANG HAO, YIN BAOQUN, XI HONGSHENG International journal of systems science . 2007,第9期

机译：半马尔可夫决策过程的优化算法的误差界
4. PAC Bounds for Simulation-based optimization of Markov Decision Processes [C] . Rahul Jain, Pravin P. Varaiya IEEE Conference on Decision and Control . 2007

机译：基于模拟的马尔可夫决策过程的PAC界限
5. Simulation-based algorithms for Markov decision processes. [D] . He, Ying. 2002

机译：基于模拟的马尔可夫决策过程算法。
6. PAC-Bayes Bounds on Variational Tempered Posteriors for Markov Models [O] . Imon Banerjee, Vinayak A. Rao, Harsha Honnappa 2021

机译：Pac-Bayes在马尔可夫模型的变分钢化学镜上界限
7. PAC bounds for multi-armed bandit and Markov decision processes [O] . Eyal Even-dar, Shie Mannor, Yishay Mansour 2002

机译：多武装匪徒和Markov决策过程的PAC边界

PAC Bounds for Simulation-based optimization of Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅