Computing semi-stationary optimal policies for multichain semi-Markov decision processes

Mondal Prasenjit

首页> 外文期刊>Annals of Operations Research >Computing semi-stationary optimal policies for multichain semi-Markov decision processes

【24h】

Computing semi-stationary optimal policies for multichain semi-Markov decision processes

机译：计算多链半马尔可夫决策过程的半平稳最优策略

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider semi-Markov decision processes with finite state and action spaces and a general multichain structure. A form of limiting ratio average (undiscounted) reward is the criterion for comparing different policies. The main result is that the value vector and a pure optimal semi-stationary policy (i.e., a policy which depends only on the initial state and the current state) for such an SMDP can be computed directly from an optimal solution of a finite set (whose cardinality equals the number of states) of linear programming (LP) problems. To be more precise, we prove that the single LP associated with a fixed initial state provides the value and an optimal pure stationary policy of the corresponding SMDP. The relation between the set of feasible solutions of each LP and the set of stationary policies is also analyzed. Examples are worked out to describe the algorithm.

机译：我们考虑具有有限状态和动作空间以及一般多链结构的半马尔可夫决策过程。限制比率平均（未折现）报酬的一种形式是比较不同政策的标准。主要结果是，可以直接从有限集的最佳解中计算出这样的SMDP的值向量和纯最优半平稳策略（即，仅依赖于初始状态和当前状态的策略）。其基数等于状态数）的线性规划（LP）问题。更准确地说，我们证明了与固定初始状态关联的单个LP提供了相应SMDP的值和最佳纯静态策略。还分析了每个LP的可行解集与固定策略集之间的关系。通过实例说明了该算法。

著录项

来源
《Annals of Operations Research》 |2020年第2期|843-865|共23页
作者
Mondal Prasenjit;
展开▼
作者单位

Govt Gen Degree Coll Dept Math Ranibandh 722135 Bankura India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Semi-Markov decision processes; Limiting ratio average reward; Multichain structure; Pure optimal semi-stationary policies; Linear programming;

机译：半马尔可夫决策过程;限制比率平均奖励;多链结构;纯最优的半平稳性政策;线性规划;

相似文献

外文文献
中文文献
专利

1. Algorithm to identify and compute average optimal policies in multichain Markov decision processes [J] . Leizarowitz A. Mathematics of operations research . 2003,第3期

机译：识别和计算多链马尔可夫决策过程中平均最优策略的算法
2. Optimality of Quasi-Open-Loop Policies for Discounted Semi-Markov Decision Processes [J] . Adelman Daniel, Mancini Angelo J. Mathematics of operations research . 2016,第4期

机译：折扣半马尔可夫决策过程的拟开环策略的最优性
3. On average reward semi-markov decision processes with a general multichain structure [J] . Jianyong L, Xiaobo Z Mathematics of operations research . 2004,第2期

机译：具有一般多链结构的平均奖励半马尔可夫决策过程
4. Optimum Maintenance Policy with Inspection by Semi-Markov Decision Processes [C] . Ge Haifeng, Tomasevicz Curtis L., Asgarpoor Sohrab, North American Power Symposium . 2007

机译：通过Semi-Markov决策流程检查的最佳维护政策
5. A New Reinforcement Learning Algorithm with Fixed Exploration for Semi-Markov Decision Processes [D] . Encapera, Angelo Michael. 2017

机译：半马尔可夫决策过程的固定探索新强化学习算法
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Regret-optimal policies in absorbing semi-Markov decision processes with multiple constraints(The Development of Information and Decision Processes) [O] . Kadota Yoshinobu, Kurano Masami, Yasuda Masami 2006

机译：吸收具有多个约束的半马尔可夫决策过程的后悔最优策略（信息和决策过程的发展）
8. Theory for Semi-Markov Decision Processes with Unbounded Costs and Its Application to the Optimal Control of Queueing Systems. [R] . Orkenyi, P. 1976

机译：无界成本半马尔可夫决策过程理论及其在排队系统最优控制中的应用。

Computing semi-stationary optimal policies for multichain semi-Markov decision processes

摘要

著录项

相似文献

相关主题

期刊订阅