...
首页> 外文期刊>Applied mathematics letters >A note on deterministic approximation of discounted Markov decision processes
【24h】

A note on deterministic approximation of discounted Markov decision processes

机译:关于折现马尔可夫决策过程的确定性逼近的注记

获取原文
获取原文并翻译 | 示例
           

摘要

We study the approximation of a small-noise Markov decision process x(t) = F(x(t-1), a(t), xi(t)(epsilon)), t = 1, 2, ... by means of its deterministic counterpart: (x) over tilde (t) = F((x) over tilde (t-1), a(t), s(0)), t = 1, 2, ... where s(0) is a fixed point of the disturbance metric space (S, r). The total discounted cost is used as a criterion of optimality. Supposing that delta(epsilon) := Er(xi(1)(epsilon), s(0)) -> 0 as epsilon -> 0, we prove the convergence of optimal policies, estimate the rate of convergence of the optimal costs and give an upper bound (depending on delta(epsilon)) for the stability index. which measures the excess of the cost due to a replacement of the optimal policy by its deterministic approximation.
机译:我们研究小噪声马尔可夫决策过程的近似值x(t)= F(x(t-1),a(t),xi(t)(epsilon)),t = 1,2,...其确定性对应项的均值:(x)超过波浪号(t)= F((x)超过波浪号(t-1),a(t),s(0)),t = 1,2,...,其中s (0)是干扰度量空间(S,r)的一个固定点。总折扣成本被用作最优标准。假设delta(epsilon):= Er(xi(1)(epsilon),s(0))-> 0为epsilon-> 0,我们证明了最优策略的收敛性,估计了最优成本的收敛速度,并且给出稳定性指数的上限(取决于delta(epsilon))。它通过确定性近似来度量由于最优策略的替换而导致的成本超额。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号