Simulation-based optimization of Markov decision processes: An empirical process theory approach

Rahul Jain; Pravin Varaiya

首页> 外文期刊>Automatica >Simulation-based optimization of Markov decision processes: An empirical process theory approach

【24h】

Simulation-based optimization of Markov decision processes: An empirical process theory approach

机译：基于模拟的马尔可夫决策过程优化：一种经验过程理论方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. We then propose a framework to obtain an ∈-optimal policy from simulation. We provide sample complexity of such an approach.

机译：我们归纳并建立在Jain和Varaiya（2006）提出的马尔可夫决策过程的PAC学习框架中。我们认为奖励功能取决于状态和行为。状态空间和动作空间都可能是无限大的。我们获得了马尔可夫决策过程的价值函数的估计值，该估计值将给每个保单分配其预期的折现报酬。可以将预期的奖励估算为许多独立模拟运行中奖励的经验平均值。我们根据政策类别的V-C或伪维数，得出一类政策将经验平均值统一收敛到预期报酬所需的运行次数的界限。然后，我们提出了一个从仿真中获得ε最优策略的框架。我们提供了这种方法的示例复杂性。

著录项

来源
《Automatica》 |2010年第8期|共8页
作者
Rahul Jain; Pravin Varaiya;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类 TP1TP2;
关键词
Markov decision processes; Learning algorithms; Monte Carlo simulation; Stochastic Control; Optimization;

机译：马尔可夫决策过程;学习算法;蒙特卡洛模拟;随机控制;优化;

相似文献

外文文献
中文文献
专利

1. Simulation-based optimization of Markov decision processes: An empirical process theory approach [J] . Rahul Jain, Pravin Varaiya Automatica . 2010,第8期

机译：基于模拟的马尔可夫决策过程优化：一种经验过程理论方法
2. Application of Markov renewal theory and semi-Markov decision processes in maintenance modeling and optimization of multi-unit systems [J] . Salari Nooshin, Makis Viliam Naval Research Logistics . 2020,第7期

机译：Markov更新理论和半马尔可夫决策过程在多单元系统维护建模和优化中的应用
3. An approximation approach for the deviation matrix of continuous-time Markov processes with application to Markov decision theory [J] . Leder N., Heidergott B., Hordijk A. Operations Research: The Journal of the Operations Research Society of America . 2010,第4aPta1期

机译：连续时间马尔可夫过程偏差矩阵的一种近似方法及其在马尔可夫决策理论中的应用
4. PAC Bounds for Simulation-based optimization of Markov Decision Processes [C] . Rahul Jain, Pravin P. Varaiya IEEE Conference on Decision and Control . 2007

机译：基于模拟的马尔可夫决策过程的PAC界限
5. Simulation-based algorithms for Markov decision processes. [D] . He, Ying. 2002

机译：基于模拟的马尔可夫决策过程算法。
6. Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations [O] . Finale Doshi-Velez, George Konidaris -1

机译：隐参数马尔可夫决策过程：发现潜在任务参数化的半参数回归方法
7. Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes [O] . Bhatnagar Shalabh, Abdulla Mohammed Shahid 2008

机译：有限视野马尔可夫决策过程的基于仿真的优化算法

Simulation-based optimization of Markov decision processes: An empirical process theory approach

摘要

著录项

相似文献

相关主题

期刊订阅