...
首页> 外文期刊>Computers & operations research >Policy iteration type algorithms for recurrent state Markov decision processes
【24h】

Policy iteration type algorithms for recurrent state Markov decision processes

机译:循环状态马尔可夫决策过程的策略迭代类型算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We introduce and analyze several new policy iteration type algorithms for average cost Markov decision processes (MDPs). We limit attention to "recurrent state" processes where there exists a state which is recurrent under all stationary policies, and our analysis applies to finite-state problems with compact constraint sets, continuous transition probability functions, and lower-semicontinuous cost functions. The analysis makes use of an underlying relationship between recurrent state MDPs and the so-called stochastic shortest path problems of Bertsekas and Tsitsiklis (Math. Oper. Res. 16(3) (1991) 580). After extending this relationship, we establish the convergence of the new policy iteration type algorithms either to optimality or to within ε > 0 of the optimal average cost.
机译:我们介绍并分析了几种用于平均成本马尔可夫决策过程(MDP)的新的策略迭代类型算法。我们将注意力集中于在所有平稳策略下都存在循环状态的“循环状态”过程,并且我们的分析适用于具有紧凑约束集,连续转移概率函数和下半连续成本函数的有限状态问题。该分析利用了循环状态MDP与Bertsekas和Tsitsiklis的所谓随机最短路径问题之间的潜在关系(Math。Oper。Res。16(3)(1991)580)。扩展此关系后,我们建立新的策略迭代类型算法的收敛性,以达到最优性或最优平均成本的ε> 0以内。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号