Policy iteration type algorithms for recurrent state Markov decision processes

Stephen D. Patek

首页> 外文期刊>Computers & operations research >Policy iteration type algorithms for recurrent state Markov decision processes

【24h】

Policy iteration type algorithms for recurrent state Markov decision processes

机译：循环状态马尔可夫决策过程的策略迭代类型算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce and analyze several new policy iteration type algorithms for average cost Markov decision processes (MDPs). We limit attention to "recurrent state" processes where there exists a state which is recurrent under all stationary policies, and our analysis applies to finite-state problems with compact constraint sets, continuous transition probability functions, and lower-semicontinuous cost functions. The analysis makes use of an underlying relationship between recurrent state MDPs and the so-called stochastic shortest path problems of Bertsekas and Tsitsiklis (Math. Oper. Res. 16(3) (1991) 580). After extending this relationship, we establish the convergence of the new policy iteration type algorithms either to optimality or to within ε > 0 of the optimal average cost.

机译：我们介绍并分析了几种用于平均成本马尔可夫决策过程（MDP）的新的策略迭代类型算法。我们将注意力集中于在所有平稳策略下都存在循环状态的“循环状态”过程，并且我们的分析适用于具有紧凑约束集，连续转移概率函数和下半连续成本函数的有限状态问题。该分析利用了循环状态MDP与Bertsekas和Tsitsiklis的所谓随机最短路径问题之间的潜在关系（Math。Oper。Res。16（3）（1991）580）。扩展此关系后，我们建立新的策略迭代类型算法的收敛性，以达到最优性或最优平均成本的ε> 0以内。

著录项

来源
《Computers & operations research》 |2004年第14期|p.2333-2347|共15页
作者
Stephen D. Patek;
展开▼
作者单位

Department of Systems and Information Engineering, UVA, 151 Engineers Way, P. O. Box 400747, Charlottesville, VA 22904, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Accelerated modified policy iteration algorithms for Markov decision processes [J] . Oleksandr Shlakhter, Chi-Guhn Lee Mathematical Methods of Operations Research . 2013,第1期

机译：马尔可夫决策过程的加速修改策略迭代算法
2. Accelerated modified policy iteration algorithms for Markov decision processes [J] . Shlakhter O., Lee C.-G. Mathematical methods of operations research . 2013,第1期

机译：马尔可夫决策过程的加速修改策略迭代算法
3. Potential-based online policy iteration algorithms for Markov decision processes [J] . Hai-Tao Fang, Xi-Ren Cao IEEE Transactions on Automatic Control . 2004,第4期

机译：基于电位的Markov决策过程在线策略迭代算法
4. A policy iteration algorithm for Markov decision processes skip-free in one direction [C] . J. Lambert, B. Van Houdt, C. Blondia Proceedings of the 2nd international conference on Performance evaluation methodologies and tools . 2007

机译：用于Markov决策过程的策略迭代算法在一个方向上无跳跃
5. Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments. [D] . Amato, Christopher. 2010

机译：用于集中式和分散式部分可观察的马尔可夫决策过程的算法中的可伸缩性不断增强：在不确定的环境中进行有效的决策和协调。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Approximate Policy Iteration for Generalized Semi-Markov Decision Processes: an Improved Algorithm [O] . Rachelson Emmanuel, Fabiani Patrick, Garcia Frédérick 2008

机译：广义半马尔可夫决策过程的近似策略迭代：改进算法

Policy iteration type algorithms for recurrent state Markov decision processes

摘要

著录项

相似文献

相关主题

期刊订阅