...
首页> 外文期刊>SIAM Journal on Control and Optimization >Optimal control of ergodic continuous-time Markov chains with average sample-path rewards
【24h】

Optimal control of ergodic continuous-time Markov chains with average sample-path rewards

机译:具有平均样本路径奖励的遍历连续时间马尔可夫链的最优控制

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this paper we study continuous-time Markov decision processes with the average sample-path reward (ASPR) criterion and possibly unbounded transition and reward rates. We propose conditions on the system's primitive data for the existence of epsilon-ASPR-optimal ( deterministic) stationary policies in a class of randomized Markov policies satisfying some additional continuity assumptions. The proof of this fact is based on the time discretization technique, the martingale stability theory, and the concept of potential. We also provide both policy and value iteration algorithms for computing, or at least approximating, the epsilon-ASPR-optimal stationary policies. We illustrate with examples our main results as well as the difference between the ASPR and the average expected reward criteria.
机译:在本文中,我们研究具有平均样本路径奖励(ASPR)准则以及可能无界的过渡和奖励率的连续时间Markov决策过程。我们在满足一些其他连续性假设的一类随机马尔可夫策略中,针对系统epsilon-ASPR-最优(确定性)平稳策略的存在提出了条件。这一事实的证明是基于时间离散技术、,稳定性理论和势能的概念。我们还提供了用于计算或至少近似epsilon-ASPR最优固定策略的策略和值迭代算法。我们通过示例说明了我们的主要结果以及ASPR与平均预期奖励标准之间的差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号