首页> 外文OA文献 >Properties of the Optimality Equation and Optimal Policies in Discrete Time Markov Decision Processes and Their Applications
【2h】

Properties of the Optimality Equation and Optimal Policies in Discrete Time Markov Decision Processes and Their Applications

机译:离散时间马尔可夫决策过程的最优性方程和最优策略的性质及其应用

摘要

This paper investigates properties of the optimaiity equation and optimal policies in discrete time Markov decision processes with expected discounted total rewards. Under conditions where the model is well defined and the optimaiity equation is true, it is shown that as a solution of the optimaiity equation, the solution called optimal value function is always the smallest one, and is also the unique one under another weak condition. Moreover, a structure of optimal policies is discussed. Finally, these properties are applied to state feedback control of discrete event systems with a numerical example.
机译:本文研究了具有预期折扣总奖励的离散时间马尔可夫决策过程中最优方程和最优策略的性质。结果表明,在模型定义明确,最优性方程为真的条件下,作为最优性方程的解,称为最优值函数的解总是最小的,而在另一个弱条件下也是唯一的。此外,讨论了最优政策的结构。最后,通过数值示例将这些属性应用于离散事件系统的状态反馈控制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号