首页> 外文期刊>Computers & operations research >Numerical analysis of continuous time Markov decision processes over finite horizons
【24h】

Numerical analysis of continuous time Markov decision processes over finite horizons

机译:有限时间范围内连续时间马尔可夫决策过程的数值分析

获取原文
获取原文并翻译 | 示例

摘要

Continuous time Markov decision processes (CTMDPs) with a finite state and action space have been considered for a long time. It is known that under fairly general conditions the reward gained over a finite horizon can be maximized by a so-called piecewise constant policy which changes only finitely often in a finite interval. Although this result is available for more than 30 years, numerical analysis approaches to compute the optimal policy and reward are restricted to discretization methods which are known to converge to the true solution if the discretization step goes to zero. In this paper, we present a new method that is based on uniformization of the CTMDP and allows one to compute an e-optimal policy up to a predefined precision in a numerically stable way using adaptive time steps.
机译:长时间考虑具有有限状态和动作空间的连续时间马尔可夫决策过程(CTMDP)。众所周知,在相当普遍的条件下,可以通过所谓的分段恒定策略来最大化在有限范围内获得的报酬,该分段恒定策略通常仅在有限间隔内有限地变化。尽管该结果已有30多年的历史了,但是用于计算最佳策略和报酬的数值分析方法仅限于离散化方法,已知该方法可以在离散化步骤变为零时收敛到真实解。在本文中,我们提出了一种基于CTMDP统一性的新方法,该方法允许人们使用自适应时间步长以数值稳定的方式计算达到预定义精度的电子最优策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号