首页> 外文期刊>SIAM Journal on Control and Optimization >STRONG UNIFORM VALUE IN GAMBLING HOUSES AND PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES
【24h】

STRONG UNIFORM VALUE IN GAMBLING HOUSES AND PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES

机译:博彩屋和部分可观察到的马尔可夫决策过程中的统一价值很强

获取原文
获取原文并翻译 | 示例
           

摘要

In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any epsilon > 0, the decision maker has a pure strategy sigma which is epsilon optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any epsilon > 0, the decision-maker can guarantee the limit of the n-stage value minus in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.
机译:在动态规划的几种标准模型(赌博场所,马尔可夫决策过程(MDP),部分可观察的MDP(POMDP))中,我们证明了存在无限重复问题的强健的价值观念,即强一致价值。两个未解决的问题。首先,这表明对于任何大于ε的epsil,决策者都有一个纯正的策略sigma,在n阶问题中,只要n足够大(对于行为策略,该结果才是已知的,其次,对于任何大于0的epsilon,决策者都可以保证无限大问题中n阶值的负负,其中回报是对时间次优的期望平均收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号