首页> 外文期刊>Mathematics of operations research >Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities
【24h】

Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

机译:具有弱连续转移概率的部分可观测的总成本马尔可夫决策过程

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This paper describes sufficient conditions for the existence of optimal policies for partially observable Markov decision processes (POMDPs) with Borel state, observation, and action sets, when the goal is to minimize the expected total costs over finite or infinite horizons. For infinite-horizon problems, one-step costs are either discounted or assumed to be nonnegative. Action sets may be noncompact and one-step cost functions may be unbounded. The introduced conditions are also sufficient for the validity of optimality equations, semicontinuity of value functions, and convergence of value iterations to optimal values. Since POMDPs can be reduced to completely observable Markov decision processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above-mentioned optimality properties for COMDPs. The central question is whether the transition probabilities for the COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if transition probabilities of the underlying Markov decision process are weakly continuous and observation probabilities for the POMDP are continuous in total variation. Moreover, the continuity in total variation of the observation probabilities cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples.
机译:当目标是使有限或无限范围内的预期总成本最小化时,本文描述了具有Borel状态,观察值和动作集的部分可观察的Markov决策过程(POMDP)的最优策略的存在的充分条件。对于无限水平问题,单步成本可以打折,也可以假定为非负。动作集可能不紧凑,一步成本函数可能是不受限制的。引入的条件对于优化方程的有效性,值函数的半连续性以及将值迭代收敛到最优值也足够了。由于可以将POMDP简化为状态为后状态分布的完全可观察的马尔可夫决策过程(COMDP),因此本文重点讨论上述最优属性对COMDP的有效性。中心问题是COMDP的过渡概率是否是弱连续的。我们为此提供了充分的条件,并表明,如果基础马尔可夫决策过程的转移概率是弱连续的,而POMDP的观察概率在总变化中是连续的,则COMDP的转移概率是弱连续的。而且,不能将观测概率的总变化的连续性减弱为设定连续性。通过反例和示例说明了结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号