首页> 外文期刊>SIAM Journal on Control and Optimization >AVERAGE COST OPTIMALITY INEQUALITY FOR MARKOV DECISION PROCESSES WITH BOREL SPACES AND UNIVERSALLY MEASURABLE POLICIES
【24h】

AVERAGE COST OPTIMALITY INEQUALITY FOR MARKOV DECISION PROCESSES WITH BOREL SPACES AND UNIVERSALLY MEASURABLE POLICIES

机译:Markov决策过程的平均成本优化不等式与Borel空间和普遍可衡量的政策

获取原文
获取原文并翻译 | 示例
           

摘要

We consider average-cost Markov decision processes (MDPs) with Borel state and action spaces and universally measurable policies. For the nonnegative cost model and an unbounded cost model with a Lyapunov-type stability character, we introduce a set of new conditions under which we prove the average cost optimality inequality (ACOI) via the vanishing discount factor approach. Unlike most existing results on the ACOI, our result does not require any compactness and continuity conditions on the MDPs. Instead, the main idea is to use the almost-uniform-convergence property of a pointwise convergent sequence of measurable functions as asserted in Egoroff's theorem. Our conditions are formulated in order to exploit this property. Among others, we require that for each state, on selected subsets of actions at that state, the state transition stochastic kernel is majorized by finite measures. We combine this majorization property of the transition kernel with Egoroff's theorem to prove the ACOI.
机译:我们将平均成本马尔可夫决策过程(MDP)视为Borel状态和行动空间以及普遍可衡量的政策。对于非负性成本模式和具有Lyapunov型稳定性特征的无限成本模型,我们介绍了一系列新的条件,我们通过消失折扣因子方法来证明平均成本最优性不等式(ACOI)。与ACOI上的大多数现有结果不同,我们的结果不需要对MDP的任何紧凑性和连续性条件。相反,主要思想是使用令人衡量的可测量功能的几乎均匀的收敛性,如Egoroff的定理所称。我们的条件是制定的,以利用此属性。其中,我们要求为每个状态,在该状态的选定行动子集上,状态转换随机内核主要由有限措施大大化。我们将过渡内核的大大化属性与Egoroff的定理结合起来证明了ACOI。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号