首页> 外文会议>Operations research and its applications >Weighted Discounted Markov Decision Processes with Perturbation
【24h】

Weighted Discounted Markov Decision Processes with Perturbation

机译:扰动的加权折扣马尔可夫决策过程

获取原文
获取原文并翻译 | 示例

摘要

In this paper we consider the weighted reward discounted Markov Decision Processes or MDP's, for short, with perturbation. We give the proof of existence of an optimal simple ultimately deterministic policy for process Γ_0(β_1 • • • ,β_k). We also prove that there exists a δ-optimal simple ultimately deterministic policy in the perturbed weighted MDP, for all d ∈ [0,∈*). Finally we prove the following result: if χ is an optimal policy of Γ_d(β_1, • ••, β_K), then for any δ > 0 there exists an ε(δ)-neighbourhood B(D) such that when D_1∈ B(D), π is a δ-optimal policy of Γ_(D_1)(β_1, •••,β_K).
机译:在本文中,我们考虑带有扰动的加权奖励折现马尔可夫决策过程或MDP。我们给出了过程Γ_0(β_1•••,β_k)的最优简单最终确定性策略的存在性证明。我们还证明,对于所有d∈[0,∈*),在扰动的加权MDP中存在δ最优简单最终确定性策略。最后,我们证明以下结果:如果χ是Γ_d(β_1,•••,β_K)的最优策略,则对于任何δ> 0,都存在一个ε(δ)邻域B(D),使得当D_1∈B时(D),π是Γ_(D_1)(β_1,•••,β_K)的δ最优策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号