An empirical study of policy convergence in Markov decision process value iteration

Christopher W. Zobel; William T. Scherer

首页> 外文期刊>Computers & operations research >An empirical study of policy convergence in Markov decision process value iteration

【24h】

An empirical study of policy convergence in Markov decision process value iteration

机译：Markov决策过程值迭代中策略收敛的实证研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The value iteration algorithm is a well-known technique for generating solutions to discounted Markov decision process (MDP) models. Although simple to implement, the approach is nevertheless limited in situations where many Markov decision processes must be solved, such as in real-time state-based control problems or in simulation/optimization problems, because of the potentially large number of iterations required for the value function to converge to an e-optimal solution. Experimental results suggest, however, that the sequence of solution policies associated with each iteration of the algorithm converges much more rapidly than does the value function. This behavior has significant implications for designing solution approaches for MDPs, yet it has not been explicitly characterized in the literature nor generated significant discussion. This paper seeks to generate such discussion by providing comparative empirical convergence results and exploring several predictors that allow estimation of policy convergence speed based on existing MDP parameters.

机译：值迭代算法是一种众所周知的技术，用于生成折现马尔可夫决策过程（MDP）模型的解。尽管实施起来很简单，但是在必须解决许多Markov决策过程的情况下，例如由于基于状态的实时控制问题或在仿真/优化问题中，由于该方法可能需要大量迭代，因此该方法受到限制。值函数收敛到电子最优解。但是，实验结果表明，与算法的每次迭代关联的解决方案序列的收敛速度远快于价值函数。此行为对于设计MDP解决方案方法具有重大意义，但是在文献中并未对其进行明确描述，也未引起重大讨论。本文试图通过提供比较经验的趋同结果并探索一些允许基于现有MDP参数估计政策趋同速度的预测因子来引发此类讨论。

著录项

来源
《Computers & operations research》 |2005年第1期|p.127-142|共16页
作者
Christopher W. Zobel; William T. Scherer;
展开▼
作者单位

Department of Business Information Technology, Virginia Tech, Blacksburg, VA 24061-0235, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
markov decision processes; dynamic programming; convergence results;

机译：马可夫决策过程;动态规划;收敛结果;

相似文献

外文文献
中文文献
专利

1. A note on the convergence of policy iteration in Markov decision processes with compact action spaces [J] . Golubin AY. Mathematics of operations research . 2003,第1期

机译：关于具有紧凑动作空间的Markov决策过程中策略迭代收敛性的注记
2. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . Fern A., Givan R., Yoon S. The Journal of Artificial Intelligence Research . 2006,第12期

机译：具有策略语言偏差的近似策略迭代：解决关系马尔可夫决策过程
3. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . A. Fern S. Yoon, R. Givan Journal of Automation, Mobile Robotics & Intelligent Systems . 2006,第5期

机译：具有策略语言偏差的近似策略迭代：解决关系马尔可夫决策过程
4. Risk-Sensitive Piecewise-Linear Policy Iteration for Stochastic Shortest Path Markov Decision Processes [C] . Henrique Dias Pastor, Igor Oliveira Borges, Valdinei Freire, Mexican International Conference on Artificial Intelligence . 2020

机译：随机最短路径马尔可夫决策过程的风险敏感分段 - 线性政策迭代
5. Acceleration of Iterative Methods for Markov Decision Processes. [D] . Shlakhter, Oleksandr. 2010

机译：马尔可夫决策过程的迭代方法的加速。
6. Multi-Vehicle Tracking via Real-Time Detection Probes and a Markov Decision Process Policy [O] . Yi Zou, Weiwei Zhang, Wendi Weng, 2019

机译：通过实时检测探针和马尔可夫决策过程策略进行多车跟踪
7. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [O] . Fern, A., Givan, R., Yoon, S. 2011

机译：使用策略语言偏差进行近似策略迭代：求解关系马尔可夫决策过程
8. Evolutionary Policy Iteration for Solving Markov Decision Processes [R] . Chang, H. S. , Lee, H. , Fu, M. , 2002

机译：求解马尔可夫决策过程的进化策略迭代

An empirical study of policy convergence in Markov decision process value iteration

摘要

著录项

相似文献

相关主题

期刊订阅