首页> 外文期刊>Probability in the Engineering and Informational Sciences >Existence of optimal stationary policies in finite dynamic programs with nonnegative rewards: an alternative approach
【24h】

Existence of optimal stationary policies in finite dynamic programs with nonnegative rewards: an alternative approach

机译:具有非负奖励的有限动态程序中最优平稳策略的存在:一种替代方法

获取原文
获取原文并翻译 | 示例

摘要

This article concerns Markov decision chains with finite state and action spaces, and a control policy is graded via the expected total-reward criterion associated to a nonnegative reward function. Within this framework, a classical theorem guaran- tees the existence of an optimal stationary policy whenever the optimal value func- tion is finite, a result that is obtained via a limit process using the discounted criterion. The objective of this article is to present an alternative approach, based entirely on The properties of the expected total-reward index, to establish such an existence Result.
机译:本文涉及具有有限状态和动作空间的马尔可夫决策链,并且通过与非负奖励函数相关的预期总回报标准对控制策略进行了分级。在此框架内,经典定理可确保当最优值函数为有限值时,存在最优平稳策略,该结果是通过使用折现准则的极限过程获得的。本文的目的是提出一种完全基于预期总回报指数的属性的替代方法,以建立这种存在结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号