首页> 外文期刊>電子情報通信学会技術研究報告. 回路とシステム. Circuits and Systems >Properties of the optimality equation and optimal policies in discrete time Markov decision processes
【24h】

Properties of the optimality equation and optimal policies in discrete time Markov decision processes

机译:离散时间马尔可夫决策过程中最优方程和最优策略的性质

获取原文
获取原文并翻译 | 示例
           

摘要

This paper investigates the properties of the optimality equation and optimal policies in discrete time Markov decision processes with expected discounted total rewards under the weak conditions that the model is well defined and the optimality equation is true. The optimal value function is characterized as a solution of the optimality equation and the structure of optimal policies is also given.
机译:本文研究了在模型定义良好且最优方程为真的弱条件下具有预期总折扣折现的离散时间马尔可夫决策过程中最优方程和最优策略的性质。将最优值函数描述为最优性方程的解,并给出了最优策略的结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号