【24h】

Policy Iteration for Factored MDPs

机译:分解MDP的策略迭代

获取原文
获取原文并翻译 | 示例

摘要

Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not retain the structure of the process, recent work has suggested that value functions in factored MDPs can often be approximated well using a factored value function: a linear combination of restricted basis functions, each of which refers only to a small subset of variables. An approximate fac-tored value function for a particular policy can be computed using approximate dynamic pro-gramming, but this approach (and others) can only produce an approximation relative to a dis-tance metric which is weighted by the station-ary distribution of the current policy. This type of weighted projection is ill-suited to policy im-provement. We present a new approach to value determination, that uses a simple closed-form computation to compute a least-squares decom-posed approximation to the value function for any weights directly. We then use this value de-termination algorithm as a subroutine in a pol-icy iteration process. We show that, under rea-sonable restrictions, the policies induced by a factored value function can be compactly repre-sented as a decision list, and can be manipulated efficiently in a policy iteration process. We also present a method for computing error bounds for decomposed value functions using a variable-elimination algorithm for function optimization. The complexity of all of our algorithms depends on the factorization of the system dynamics and of the approximate value function.
机译:使用动态贝叶斯网络可以紧凑地表示许多大型MDP。尽管值函数的结构不能保留过程的结构,但是最近的工作表明,可以使用因数值函数来很好地近似因式分解后的MDP中的值函数:受限基函数的线性组合,每个基础函数仅引用一小部分变量。可以使用近似动态编程来计算特定策略的近似因子值函数,但是这种方法(和其他方法)只能产生相对于距离度量的近似值,该距离度量值由的固定分布来加权。当前的政策。这种加权预测不适用于政策改进。我们提出了一种新的价值确定方法,该方法使用简单的封闭形式计算来直接为任何权重计算对值函数的最小二乘分解近似值。然后,我们将此值确定算法用作策略迭代过程中的子例程。我们表明,在合理的限制下,因式分解函数所诱导的策略可以紧凑地表示为决策列表,并且可以在策略迭代过程中有效地进行操作。我们还提出了一种使用变量消除算法进行分解的函数功能,为函数优化计算误差范围的方法。我们所有算法的复杂性取决于系统动力学和近似值函数的分解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号