...
首页> 外文期刊>Annals of Operations Research >Computing semi-stationary optimal policies for multichain semi-Markov decision processes
【24h】

Computing semi-stationary optimal policies for multichain semi-Markov decision processes

机译:计算多链半马尔可夫决策过程的半平稳最优策略

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We consider semi-Markov decision processes with finite state and action spaces and a general multichain structure. A form of limiting ratio average (undiscounted) reward is the criterion for comparing different policies. The main result is that the value vector and a pure optimal semi-stationary policy (i.e., a policy which depends only on the initial state and the current state) for such an SMDP can be computed directly from an optimal solution of a finite set (whose cardinality equals the number of states) of linear programming (LP) problems. To be more precise, we prove that the single LP associated with a fixed initial state provides the value and an optimal pure stationary policy of the corresponding SMDP. The relation between the set of feasible solutions of each LP and the set of stationary policies is also analyzed. Examples are worked out to describe the algorithm.
机译:我们考虑具有有限状态和动作空间以及一般多链结构的半马尔可夫决策过程。限制比率平均(未折现)报酬的一种形式是比较不同政策的标准。主要结果是,可以直接从有限集的最佳解中计算出这样的SMDP的值向量和纯最优半平稳策略(即,仅依赖于初始状态和当前状态的策略)。其基数等于状态数)的线性规划(LP)问题。更准确地说,我们证明了与固定初始状态关联的单个LP提供了相应SMDP的值和最佳纯静态策略。还分析了每个LP的可行解集与固定策略集之间的关系。通过实例说明了该算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号