首页> 外文期刊>SIAM Journal on Control and Optimization >A CONVEX PROGRAMMING APPROACH FOR DISCRETE-TIME MARKOV DECISION PROCESSES UNDER THE EXPECTED TOTAL REWARD CRITERION
【24h】

A CONVEX PROGRAMMING APPROACH FOR DISCRETE-TIME MARKOV DECISION PROCESSES UNDER THE EXPECTED TOTAL REWARD CRITERION

机译:预期总奖励标准下的离散时间马尔可夫决策过程的凸编程方法

获取原文
获取原文并翻译 | 示例
           

摘要

In this work, we study discrete-time Markov decision processes (MDPs) under constraints with Borel state and action spaces and where all the performance functions have the same form of the expected total reward (ETR) criterion over the infinite time horizon. One of our objective is to propose a convex programming formulation for this type of MDP. It will be shown that the values of the constrained control problem and the associated convex program coincide. Moreover, if there exists an optimal solution to the convex program then there exists a stationary randomized policy which is optimal for the MDP. It will be also shown that in the framework of constrained control problems, the supremum of the ETRs over the set of randomized policies is equal to the supremum of the ETRs over the set of stationary randomized policies. We consider standard hypotheses such as the so-called continuity-compactness conditions and a Slater-type condition. Our assumptions are quite weak to deal with cases that have not yet been addressed in the literature. Examples are presented to illustrate our results.
机译:在这项工作中,我们研究了与Borel状态和动作空间的约束下的离散时间马尔可夫决策过程(MDP),并且所有性能功能在无限时间范围内具有相同的预期总奖励(ETR)标准。我们的目标是为此类型的MDP提出凸编程制定。将表明,受限控制问题的值和相关的凸面编程重合。此外,如果存在对凸节目的最佳解决方案,则存在对MDP最佳的静止随机策略。还将表明,在受约束的控制问题的框架中,在随机策略集合上的ETRS的高度等于静止随机策略集中的ETRS的高度。我们考虑标准假设,例如所谓的连续性紧凑性条件和滑动式条件。我们的假设非常薄弱地处理尚未在文献中尚未解决的案件。提出了示例以说明我们的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号