First Passage Optimality for Continuous-Time Markov Decision Processes With Varying Discount Factors and History-Dependent Policies

Guo X.; Song X.; Zhang Y.

首页> 外文期刊>IEEE Transactions on Automatic Control >First Passage Optimality for Continuous-Time Markov Decision Processes With Varying Discount Factors and History-Dependent Policies

【24h】

First Passage Optimality for Continuous-Time Markov Decision Processes With Varying Discount Factors and History-Dependent Policies

机译：可变折扣因子和历史相关策略的连续时间马尔可夫决策过程的第一遍最优性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper is an attempt to study the first passage optimality criterion for continuous-time Markov decision processes with state-dependent discount factors and history-dependent policies. The state space is denumerable, the action space is a Borel space, and the transition and reward rates are unbounded. Under suitable conditions, we show the existence of a deterministic stationary optimal policy, establish the Bellman (optimality) equation, to which the value function is the unique solution, and give the value and policy iteration algorithms for solving (at least approximating) the value function and an optimal policy. Furthermore, we give examples about reliability and controlled birth processes with killing to illustrate the potential applications of the results obtained here, and also to show the difference between the main results in this paper and those in the previous literature.

机译：本文试图研究具有状态依赖折现因子和历史依赖策略的连续时间马尔可夫决策过程的第一通道最优准则。状态空间是可数的，动作空间是Borel空间，过渡和奖励率是无界的。在合适的条件下，我们证明确定性平稳最优策略的存在，建立Bellman（最优性）方程，以值函数为唯一解，并给出用于求解（至少近似）值的值和策略迭代算法功能和最佳策略。此外，我们提供了有关可靠性和控制性生育过程的实例，其中包括杀死事件，以说明此处获得的结果的潜在应用，并说明本文的主要结果与先前文献中的结果之间的区别。

著录项

来源
《IEEE Transactions on Automatic Control》 |2014年第1期|163-174|共12页
作者
Guo X.; Song X.; Zhang Y.;
展开▼
作者单位

School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, P.R. China|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Continuous-time Markov decision process; first passage criterion; varying discount factor;

机译：连续时间马尔可夫决策过程;初次通过准则;可变折现因子;

相似文献

外文文献
中文文献
专利

1. FIRST PASSAGE OPTIMALITY AND VARIANCE MINIMISATION OF MARKOV DECISION PROCESSES WITH VARYING DISCOUNT FACTORS [J] . Wu Xiao, Guo Xianping Journal of Applied Probability . 2015,第2期

机译：具有多种折扣因素的马尔可夫决策过程的第一通道最优性和方差最小化
2. Discounted continuous-time Markov decision processes with unbounded rates and randomized history-dependent policies: the dynamic programming approach [J] . Alexey Piunovskiy, Yi Zhang 4OR: Quarterly Journal of the Belgian, French and Italian Operations Research Societies . 2014,第1期

机译：具有无限制利率和依赖历史的随机策略的折扣连续时间马尔科夫决策过程：动态规划方法
3. LINEAR PROGRAMMING AND CONSTRAINED AVERAGE OPTIMALITY FOR GENERAL CONTINUOUS-TIME MARKOV DECISION PROCESSES IN HISTORY-DEPENDENT POLICIES [J] . XIANPING GUO, YONGHUI HUANG, XINYUAN SONG SIAM Journal on Control and Optimization . 2012,第1期

机译：历史相关策略中一般连续时间马尔可夫决策过程的线性规划和约束平均最优性
4. An application to the finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors [C] . Xiao Wu, Junyu Zhang World Congress on Intelligent Control and Automation . 2014

机译：可变折扣因子的离散时间马尔可夫决策过程在第一阶段模型有限逼近中的应用
5. Linear approximations for factored Markov decision processes. [D] . Patrascu, Relu-Eugen. 2005

机译：因子马尔可夫决策过程的线性近似。
6. Designing evaluation studies to optimally inform policy: what factors do policy-makers in China consider when making resource allocation decisions on healthcare worker training programmes? [O] . Shishi Wu, Helena Legido-Quigley, Julia Spencer, 2018

机译：设计评估研究以最佳地为政策提供信息：中国的决策者在制定医护人员培训计划的资源分配决策时会考虑哪些因素？
7. On the First Passage $g$-Mean-Variance Optimality for Discounted Continuous-Time Markov Decision Processes [O] . Guo X, Huang X, Zhang Y 2015

机译：贴现连续时间马尔可夫决策过程的第一遍$ g $-均值最优性

First Passage Optimality for Continuous-Time Markov Decision Processes With Varying Discount Factors and History-Dependent Policies

摘要

著录项

相似文献

相关主题

期刊订阅