首页> 美国政府科技报告 >Initially Stationary epsilon-Optimal Policies in Continuous Time Markov Decision Chains
【24h】

Initially Stationary epsilon-Optimal Policies in Continuous Time Markov Decision Chains

机译:连续时间马尔可夫决策链中的初始ε-最优策略

获取原文

摘要

The asymptotic behavior of continuous time parameter Markov decision chains is studied. It is shown that the maxiaml total expected t period reward, less t times the maximal long-run average return rate, converges as t approaches infinity for every initial state. This result is used to establish the existence of policies which are simultaneously epsilon-optimal for all process durations, and which are stationary except possibly for a final, finite segment. Further, the length of the final segment depends on epsilon, but not on t for large enough t, while the initial stationary part of the policy is independent of both epsilon and t. The decision rules comprising the initially stationary part of these policies, called preferred, are characterized. Finite algorithms for finding preferred decision rules are given under varying hypotheses on the underlying structure of the system, though the general case case remains unsolved. (Author)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号