Bounding reward measures of Markov models using the Markov decision processes

Buchholz P.

首页> 外文期刊>Numerical linear algebra with applications >Bounding reward measures of Markov models using the Markov decision processes

【24h】

Bounding reward measures of Markov models using the Markov decision processes

机译：使用马尔可夫决策过程的马尔可夫模型的有界奖励度量

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

For a Markov reward process, where upper and lower bounds for the transition rates and rewards are known, a new approach to bound the expected reward is presented. Based on a previous paper where sharp bounds have been defined for the problem, but only an inefficient and unstable algorithm is proposed, this paper presents algorithms to compute the bounds by interpreting the problem as a Markov Decision Process. In this way, the well known value and policy iteration algorithms can be adopted to compute reward bounds in a stable and fairly efficient way. Different numerical techniques are presented for computing the reward bounds.

机译：对于马尔可夫奖赏过程，在已知过渡率和奖赏的上限和下限的情况下，提出了一种限制期望奖赏的新方法。基于先前为问题定义了界线，但仅提出了一种效率低下且不稳定的算法的论文，本文提出了通过将问题解释为马尔可夫决策过程来计算界线的算法。这样，可以采用众所周知的价值和策略迭代算法来以稳定且相当有效的方式计算奖励界限。提出了用于计算奖励界限的不同数值技术。

著录项

来源
《Numerical linear algebra with applications》 |2011年第6期|共12页
作者
Buchholz P.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类代数方程论、线性代数;
关键词
Bounds; Markov decision processes; Markov processes; Stationary analysis;

机译：边界;马尔可夫决策过程;马尔可夫过程;平稳分析;

相似文献

外文文献
中文文献
专利

1. Bounding reward measures of Markov models using the Markov decision processes [J] . Buchholz P. Numerical linear algebra with applications . 2011,第6期

机译：使用马尔可夫决策过程的马尔可夫模型的有界奖励度量
2. Partially Observable Markov Decision Processes With Reward Information: Basic Ideas and Models [J] . Xi-Ren Cao, Xianping Guo IEEE Transactions on Automatic Control . 2007,第期

机译：具有奖励信息的部分可观察的马尔可夫决策过程：基本思想和模型
3. Partially Observable Markov Decision Processes With Reward Information: Basic Ideas and Models [J] . Xi-Ren Cao, Xianping Guo IEEE Transactions on Automatic Control . 2007,第4期

机译：具有奖励信息的部分可观察的马尔可夫决策过程：基本思想和模型
4. Markov Reward Models and Markov Decision Processes in Discrete and Continuous Time: Performance Evaluation and Optimization [C] . Alexander Gouberman, Markus Siegle International Autumn School on Stochastic Model Checking . 2014

机译：马尔可夫奖励模型和马尔可夫决策过程在离散和连续时间：性能评估和优化
5. Modern Methods of Hidden Markov Models and Partially Observable Markov Decision Processes in Biostatistics [D] . Xu, Zekun. 2020

机译：隐藏马尔可夫模型的现代方法和止痛性的部分可观察马尔可夫决策过程
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Bounded Parameter Markov Decision Processes with Average Reward Criterion [O] . Ambuj Tewari, Peter L. Bartlett 2010

机译：具有平均奖励准则的有界参数马尔可夫决策过程

Bounding reward measures of Markov models using the Markov decision processes

摘要

著录项

相似文献

相关主题

期刊订阅