Optimal control of ergodic continuous-time Markov chains with average sample-path rewards

Guo XP; Cao XR

首页> 外文期刊>SIAM Journal on Control and Optimization >Optimal control of ergodic continuous-time Markov chains with average sample-path rewards

【24h】

Optimal control of ergodic continuous-time Markov chains with average sample-path rewards

机译：具有平均样本路径奖励的遍历连续时间马尔可夫链的最优控制

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we study continuous-time Markov decision processes with the average sample-path reward (ASPR) criterion and possibly unbounded transition and reward rates. We propose conditions on the system's primitive data for the existence of epsilon-ASPR-optimal ( deterministic) stationary policies in a class of randomized Markov policies satisfying some additional continuity assumptions. The proof of this fact is based on the time discretization technique, the martingale stability theory, and the concept of potential. We also provide both policy and value iteration algorithms for computing, or at least approximating, the epsilon-ASPR-optimal stationary policies. We illustrate with examples our main results as well as the difference between the ASPR and the average expected reward criteria.

机译：在本文中，我们研究具有平均样本路径奖励（ASPR）准则以及可能无界的过渡和奖励率的连续时间Markov决策过程。我们在满足一些其他连续性假设的一类随机马尔可夫策略中，针对系统epsilon-ASPR-最优（确定性）平稳策略的存在提出了条件。这一事实的证明是基于时间离散技术、,稳定性理论和势能的概念。我们还提供了用于计算或至少近似epsilon-ASPR最优固定策略的策略和值迭代算法。我们通过示例说明了我们的主要结果以及ASPR与平均预期奖励标准之间的差异。

著录项

来源
《SIAM Journal on Control and Optimization》 |2005年第1期|共20页
作者
Guo XP; Cao XR;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类应用数学;
关键词
average sample-path reward; continuous-time Markov chain; optimal stationary policy; policy and value iteration algorithms; CONTROLLED QUEUING-SYSTEMS; COUNTABLE STATE-SPACE; DECISION-PROCESSES; SENSITIVITY-ANALYSIS; BIAS OPTIMALITY; COST CRITERION; POTENTIALS; POLICIES; MODELS;

机译：平均样本路径奖励;连续时间马尔可夫链;最优平稳策略;政策和价值迭代算法;受控排队系统;可数状态空间;决策过程;灵敏度分析;偏倚最优性;成本准则;潜在性;政策;楷模;

相似文献

外文文献
中文文献
专利

1. Optimal control of ergodic continuous-time Markov chains with average sample-path rewards [J] . Guo XP, Cao XR SIAM Journal on Control and Optimization . 2005,第1期

机译：具有平均样本路径奖励的遍历连续时间马尔可夫链的最优控制
2. A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion [J] . Rolando Cavazos-Cadena, Raúl Montes-de-Oca, Karel Sladky Journal of Optimization Theory and Applications . 2014,第2期

机译：具有平均奖励标准的稳定马尔可夫决策链中样本路径最优的反例
3. SAMPLE-PATH OPTIMAL STATIONARY POLICIES IN STABLE MARKOV DECISION CHAINS WITH THE AVERAGE REWARD CRITERION [J] . Cavazos-Cadena Rolando, Montes-De-Oca Raul, Sladky Karel Journal of Applied Probability . 2015,第2期

机译：带有平均奖励标准的稳定马尔可夫决策链中的样本路径最优平稳策略
4. Denumerable controlled Markov chains with average reward criterion: sample path optimality [C] . Cavazos-Cadena, R., Fernandez-Gaucheraud, . 1994

机译：具有平均奖励标准的可数控制马尔可夫链：样本路径最优
5. Controlled Markov chains with risk-sensitive average cost criterion. [D] . Brau Rojas, Agustin. 1999

机译：具有风险敏感平均成本准则的受控马尔可夫链。
6. Input estimation for drug discovery using optimal control and Markov chain Monte Carlo approaches [O] . Magnus Trägårdh, Michael J. Chappell, Andrea Ahnmark, -1

机译：使用最佳控制和马尔可夫链蒙特卡洛方法进行药物发现的输入估计
7. Average, sensitive and Blackwell-optimal policies in denumerable Markov decision chains with unbounded rewards [O] . Dekker, R. (Rommert), Hordijk, A. (Arie) 1988

机译：具有无穷回报的可数马尔可夫决策链中的平均，敏感和布莱克韦尔最优策略
8. Blackwell Optimality in the Class of All Policies in Markov Decision Chains witha Borel State Space and Unbounded Rewards [R] . Hordijk, A., Yushkevich, A. A. 2000

机译：具有Borel状态空间和无界奖励的马尔可夫决策链中所有策略类的Blackwell最优性

Optimal control of ergodic continuous-time Markov chains with average sample-path rewards

摘要

著录项

相似文献

相关主题

期刊订阅