A simulation-based learning automata framework for solving semi-Markov decision problems sunder long-run average reward

ABHIJIT GOSAVI; TAPAS K. DAS; SUDEEP SARKAR

首页> 外文期刊>IIE Transactions >A simulation-based learning automata framework for solving semi-Markov decision problems sunder long-run average reward

【24h】

A simulation-based learning automata framework for solving semi-Markov decision problems sunder long-run average reward

机译：基于模拟的学习自动机框架，用于解决长期平均奖励下的半马尔可夫决策问题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many problems of sequential decision making under uncertainty, whose underlying probabilistic structure has a Markov chain, can be set up as Markov Decision Problems (MDPs). However, when their underlying transition mechanism cannot be characterized by the Markov chain alone, the problems may be set up as Semi-Markov Decision Problems (SMDPs). The framework of dynamic programming has been used extensively in the literature to solve such problems. An alternative framework that exists in the literature is that of the Learning Automata (LA). This framework can be combined with simulation to develop convergent LA algorithms for solving MDPs under long-run cost (or reward). A very attractive feature of this framework is that it avoids a major stumbling block of dynamic programming; that of having to compute the one-step transition probability matrices of the Markov chain for every possible action of the decision-making process. In this paper, we extend this framework to the more general SMDP. We also present numerical results on a case study from the domain of preventive maintenance in which the decision-making problem is modeled as a SMDP. An algorithm based on LA theory is employed, which may be implemented in a simulator as a solution method. It produces satisfactory results in all the numerical examples studied.

机译：不确定条件下的顺序决策的许多问题可以设置为马尔可夫决策问题（MDP），这些问题的潜在概率结构具有马尔可夫链。但是，当其潜在的转移机制不能仅通过马尔可夫链来表征时，可以将这些问题设置为半马尔可夫决策问题（SMDP）。动态编程的框架已在文献中广泛用于解决此类问题。文献中存在的替代框架是学习自动机（LA）的框架。该框架可以与仿真相结合，以开发收敛的LA算法来解决长期成本（或报酬）下的MDP。该框架的一个非常吸引人的特征是它避免了动态编程的主要绊脚石。必须为决策过程的每个可能动作计算马尔可夫链的一步转移概率矩阵。在本文中，我们将此框架扩展到更通用的SMDP。我们还从预防性维护领域的案例研究中给出了数值结果，在该案例中，决策问题被建模为SMDP。采用基于LA理论的算法，其可以在模拟器中作为解决方法来实现。在所研究的所有数值示例中，均产生令人满意的结果。

著录项

来源
《IIE Transactions》 |2004年第6期|p.557-567|共11页
作者
ABHIJIT GOSAVI; TAPAS K. DAS; SUDEEP SARKAR;
展开▼
作者单位

Department of Industrial Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类一般工业技术;
关键词
入库时间 2022-08-18 03:51:12

相似文献

外文文献
中文文献
专利

1. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning [J] . Tapas K. Das, Abhijit Gosavi, Sridhar Mahadevan, Management science: Journal of the Institute of Management Sciences . 1999,第4期

机译：使用平均奖励强化学习解决半马尔可夫决策问题
2. Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward [J] . Jan Kretinsky, Fabian Michel, Lukas Michel, JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：有限记忆近最优学习马尔可夫决策过程，长期奖励
3. Semi-Markov decision processes with limiting ratio average rewards [J] . Sinha Sagnik, Mondal Prasenjit Journal of Mathematical Analysis and Applications . 2017,第1期

机译：半马尔可夫决策过程，限制比率奖励
4. Average Reward Reinforcement Learning for Semi-Markov Decision Processes [C] . Jiayuan Yang, Yanjie Li, Haoyao Chen, International conference on neural information processing . 2017

机译：半马尔可夫决策过程的平均奖励强化学习
5. Modeling Team-Compatibility Factors Using a Semi-Markov Decision Process: A Framework for Performance Analysis in Soccer. [D] . Jarvandi, Ali. 2014

机译：使用半马尔可夫决策过程建模团队兼容性因素：足球绩效分析的框架。
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Solving Semi-Markov Decision Problems using Average Reward Reinforcement Learning [O] . Tapas Das, Abhijit Gosavi, Sridhar Mahadevan, 1999

机译：使用平均奖励强化学习解决半马尔可夫决策问题
8. An Algorithm for Average Costs Denumerable State Semi-Markov Decision Problems with Applications to Controlled Production and Queueing Systems [R] . Tijms, H. C. 1978

机译：一种适用于控制生产和排队系统的平均成本可数状态半马尔可夫决策问题的算法

A simulation-based learning automata framework for solving semi-Markov decision problems sunder long-run average reward

摘要

著录项

相似文献

相关主题

期刊订阅