Max-norm Projections for Factored MDPs

机译：分解MDP的最大范数投影

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncertainty. However, exact MDP solution algorithms require the manipulation of a value function, which specifies a value for each state in the system. Most real-world MDPs are too large for such a representation to be feasible, preventing the use of exact MDP algorithms. Various approximate solution algorithms have been proposed, many of which use a linear combination of basis functions as a compact approximation to the value function. Almost all of these algorithms use an approximation based on the (weighted) L_2-norm (Euclidean distance); this approach prevents the application of standard convergence results for MDP algorithms, all of which are based on max-norm. This paper makes two contributions. First, it presents the first approximate MDP solution algorithms ― both value and policy iteration ― that use max-norm projection, thereby directly optimizing the quantity required to obtain the best error bounds. Second, it shows how these algorithms can be applied efficiently in the context of factored MDPs, where the transition model is specified using a dynamic Bayesian network.

机译：马尔可夫决策过程（MDP）为不确定性下的规划提供了一个连贯的数学框架。但是，精确的MDP解决方案算法需要操纵值函数，该函数为系统中的每个状态指定一个值。大多数现实世界中的MDP太大，以致于这种表示不可行，从而阻止了使用精确的MDP算法。已经提出了各种近似解算法，其中许多使用基函数的线性组合作为对值函数的紧凑近似。几乎所有这些算法都使用基于（加权）L_2范数（欧几里得距离）的近似值。这种方法会阻止将标准收敛结果应用于所有基于max-norm的MDP算法。本文有两个贡献。首先，它介绍了使用最大范数投影的第一个近似MDP解决方案算法（值和策略迭代），从而直接优化了获得最佳误差范围所需的数量。其次，它显示了如何在因式MDP（使用动态贝叶斯网络指定过渡模型）的情况下有效地应用这些算法。

著录项

来源
《Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01) Vol.1, Aug 4-10, 2001, Seattle, Washington》|2001年|p.673-680|共8页
会议地点
作者
Carlos Guestrin; Daphne Koller; Ronald Parr;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-26 14:07:26

相似文献

外文文献
中文文献
专利

1. A Stochastic Approximation Method with Max-Norm Projections and its Applications to the Q-Learning Algorithm [J] . SUMIT KUNNUMKAL, HUSEYIN TOPALOGLU ACM Transactions on Modeling and Computer Simulation . 2010,第3期

机译：最大范数投影的随机逼近方法及其在Q学习算法中的应用
2. Efficient approximate linear programming for factored MDPs [J] . Chen Feng, Cheng Qiang, Dong Jianwu, 高分子論文集 . 2015,第auga期

机译：分解式MDP的高效近似线性编程
3. Discovering hidden structure in factored MDPs [J] . Andrey Kolobov, Mausam, Daniel S. Weld Artificial intelligence . 2012,第期

机译：发现分解MDP中的隐藏结构
4. Max-norm Projections for Factored MDPs [C] . Carlos Guestrin, Daphne Koller, Ronald Parr International Joint Conference on Artificial Intelligence . 2007

机译：因子MDP的MAX-NOMP投影
5. Active Cyber Deception and Attacker Intent Recognition Using Factored Interactive POMDPs [D] . Shinde, Aditya P. 2020

机译：有源网络欺骗和攻击者使用因子互动POMDPS的意图识别
6. Sustained Growth of the Impact Factors of MDPI Open Access Journals [O] . Dietrich Rordorf 2012

机译：MDPI开放获取期刊影响因素的持续增长
7. A Stochastic Approximation Method with Max-Norm Projections and its Applications to the Q-Learning Algorithm [O] . Sumit Kunnumkal, Huseyin Topaloglu 2009

机译：最大范数投影的随机逼近方法及其在Q学习算法中的应用

Max-norm Projections for Factored MDPs

摘要

著录项

相似文献

相关主题

期刊订阅