The Transformation Method for Continuous-Time Markov Decision Processes

Piunovskiy A.; Zhang Y.

首页> 外文期刊>Journal of Optimization Theory and Applications >The Transformation Method for Continuous-Time Markov Decision Processes

【24h】

The Transformation Method for Continuous-Time Markov Decision Processes

机译：连续时间马尔可夫决策过程的变换方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we show that a discounted continuous-time Markov decision process in Borel spaces with randomized history-dependent policies, arbitrarily unbounded transition rates and a non-negative reward rate is equivalent to a discrete-time Markov decision process. Based on a completely new proof, which does not involve Kolmogorov's forward equation, it is shown that the value function for both models is given by the minimal non-negative solution to the same Bellman equation. A verifiable necessary and sufficient condition for the finiteness of this value function is given, which induces a new condition for the non-explosion of the underlying controlled process.

机译：在本文中，我们表明，具有随机历史依赖策略，任意无界过渡率和非负奖励率的Borel空间中的折扣连续时间Markov决策过程等效于离散时间Markov决策过程。基于不涉及Kolmogorov正向方程的全新证明，表明两个模型的值函数均由对同一Bellman方程的最小非负解给出。给出了该值函数有限性的可验证的必要和充分条件，这为基础控制过程的不爆炸引发了新的条件。

著录项

来源
《Journal of Optimization Theory and Applications》 |2012年第2期|共22页
作者
Piunovskiy A.; Zhang Y.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类应用数学;
关键词
Continuous-time Markov decision process; Discrete-time Markov decision process; History-dependent policies; Transformation method; Unbounded transition rates;

机译：连续时间马尔可夫决策过程;离散时间马尔可夫决策过程;历史相关策略;转换方法;无限制转换率;

相似文献

外文文献
中文文献
专利

1. The Transformation Method for Continuous-Time Markov Decision Processes [J] . Piunovskiy A., Zhang Y. Journal of Optimization Theory and Applications . 2012,第2期

机译：连续时间马尔可夫决策过程的变换方法
2. An approximation approach for the deviation matrix of continuous-time Markov processes with application to Markov decision theory [J] . Leder N., Heidergott B., Hordijk A. Operations Research: The Journal of the Operations Research Society of America . 2010,第4aPta1期

机译：连续时间马尔可夫过程偏差矩阵的一种近似方法及其在马尔可夫决策理论中的应用
3. Policy learning in continuous-time Markov decision processes using Gaussian Processes [J] . Bartocci Ezio, Bortolussi Luca, Brazdil Tomas, Performance Evaluation . 2017,第nova期

机译：使用高斯过程的连续时间马尔可夫决策过程中的策略学习
4. Sufficiency of Markov policies for continuous-time Markov decision processes and solutions to Kolmogorov's forward equation for jump Markov processes [C] . Feinberg E.A., Mandava M., Shiryaev A.N. IEEE Annual Conference on Decision and Control . 2013

机译：连续时间马尔可夫决策过程的马尔可夫策略的充分性以及跳跃马尔可夫过程的Kolmogorov正方程的解
5. Modern Methods of Hidden Markov Models and Partially Observable Markov Decision Processes in Biostatistics [D] . Xu, Zekun. 2020

机译：隐藏马尔可夫模型的现代方法和止痛性的部分可观察马尔可夫决策过程
6. Using model-based proposals for fast parameter inference on discrete state space continuous-time Markov processes [O] . C. M. Pooley, S. C. Bishop, G. Marion 2015

机译：使用基于模型的建议对离散状态空间连续时间马尔可夫过程进行快速参数推断
7. An approximation approach for the deviation matrix of continuous-time Markov processes with application to Markov decision theory [O] . Heidergott, B.F., Hordijk, A., Leder, N. 2010

机译：连续时间马尔可夫过程偏差矩阵的一种近似方法及其在马尔可夫决策理论中的应用

The Transformation Method for Continuous-Time Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅