首页> 外文期刊>Numerical linear algebra with applications >Bounding reward measures of Markov models using the Markov decision processes
【24h】

Bounding reward measures of Markov models using the Markov decision processes

机译:使用马尔可夫决策过程的马尔可夫模型的有界奖励度量

获取原文
获取原文并翻译 | 示例
           

摘要

For a Markov reward process, where upper and lower bounds for the transition rates and rewards are known, a new approach to bound the expected reward is presented. Based on a previous paper where sharp bounds have been defined for the problem, but only an inefficient and unstable algorithm is proposed, this paper presents algorithms to compute the bounds by interpreting the problem as a Markov Decision Process. In this way, the well known value and policy iteration algorithms can be adopted to compute reward bounds in a stable and fairly efficient way. Different numerical techniques are presented for computing the reward bounds.
机译:对于马尔可夫奖赏过程,在已知过渡率和奖赏的上限和下限的情况下,提出了一种限制期望奖赏的新方法。基于先前为问题定义了界线,但仅提出了一种效率低下且不稳定的算法的论文,本文提出了通过将问题解释为马尔可夫决策过程来计算界线的算法。这样,可以采用众所周知的价值和策略迭代算法来以稳定且相当有效的方式计算奖励界限。提出了用于计算奖励界限的不同数值技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号