【24h】

Max-norm Projections for Factored MDPs

机译:分解MDP的最大范数投影

获取原文

摘要

Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncertainty. However, exact MDP solution algorithms require the manipulation of a value function, which specifies a value for each state in the system. Most real-world MDPs are too large for such a representation to be feasible, preventing the use of exact MDP algorithms. Various approximate solution algorithms have been proposed, many of which use a linear combination of basis functions as a compact approximation to the value function. Almost all of these algorithms use an approximation based on the (weighted) L_2-norm (Euclidean distance); this approach prevents the application of standard convergence results for MDP algorithms, all of which are based on max-norm. This paper makes two contributions. First, it presents the first approximate MDP solution algorithms ― both value and policy iteration ― that use max-norm projection, thereby directly optimizing the quantity required to obtain the best error bounds. Second, it shows how these algorithms can be applied efficiently in the context of factored MDPs, where the transition model is specified using a dynamic Bayesian network.
机译:马尔可夫决策过程(MDP)为不确定性下的规划提供了一个连贯的数学框架。但是,精确的MDP解决方案算法需要操纵值函数,该函数为系统中的每个状态指定一个值。大多数现实世界中的MDP太大,以致于这种表示不可行,从而阻止了使用精确的MDP算法。已经提出了各种近似解算法,其中许多使用基函数的线性组合作为对值函数的紧凑近似。几乎所有这些算法都使用基于(加权)L_2范数(欧几里得距离)的近似值。这种方法会阻止将标准收敛结果应用于所有基于max-norm的MDP算法。本文有两个贡献。首先,它介绍了使用最大范数投影的第一个近似MDP解决方案算法(值和策略迭代),从而直接优化了获得最佳误差范围所需的数量。其次,它显示了如何在因式MDP(使用动态贝叶斯网络指定过渡模型)的情况下有效地应用这些算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号