Greedy Linear Value-approximation fo radtored Markov Decision Processes

机译：贪婪的线性值 - 近似用于Restored Markov决策过程

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved: How accurate are the approximations produced by linear programs? How hard is it to produce better approximations? and Where do the basis functions come from? To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximation―showing that this is an inherently hard problem. Nevertheless, we provide a branch and bound method for calculating -Bellman error and performing approximate policy iteration for general factored MDPs. These methods are more accurate than linear programming, but more expensive. We then consider linear programming itself and investigate methods for automatically constructing sets of basis functions that allow this approach to produce good approximations. The techniques we develop are guaranteed to reduce L_1 error, but can also empirically reduce Bellman error.

机译：最近的最近工作的重点是使用线性表示来实现因子马尔可夫决策过程（MDP）的近似值函数。目前的研究采用线性规划作为计算给定基础函数集的近似的有效手段，因此应对非常大的MDP来计算。但是，许多问题仍未解决：线性程序产生的近似值是多么准确？产生更好的近似值是多么努力？基础函数来自哪里？为了解决这些问题，我们首先调查最小化线性值函数近似的Bellman错误的复杂性 - 显示这是一个固有的难题。尽管如此，我们提供了计算-Bellman错误的分支和绑定方法，并对一般因子MDP执行近似政策迭代。这些方法比线性编程更准确，但更昂贵。然后，我们考虑线性编程本身并调查自动构建基础函数集的方法，允许这种方法产生良好的近似。我们开发的技术得到保证降低L_1错误，但也可以凭经验减少Bellman错误。

著录项

来源
《National Conference on Artificial Intelligence》|2002年||共7页
会议地点
作者
Relu Patrascu; Pascal Poupart; Dale Schuurmans; Craig Boutilier; Carlos Guestrin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Cooperative Markov decision processes: time consistency, greedy players satisfaction, and cooperation maintenance [J] . Konstantin Avrachenkov, Laura Cottatellucci, Lorenzo Maggi International Journal of Game Theory . 2013,第1期

机译：马尔可夫合作决策过程：时间一致性，贪婪的玩家满意度和合作维护
2. Cooperative Markov decision processes: time consistency, greedy players satisfaction, and cooperation maintenance [J] . Konstantin Avrachenkov, Laura Cottatellucci, Lorenzo Maggi International Journal of Game Theory . 2013,第1期

机译：马尔可夫合作决策过程：时间一致性，贪婪的玩家满意度和合作维护
3. Fast rates for online learning in Linearly Solvable Markov Decision Processes [J] . Gergely Neu, Vicen? Gómez JMLR: Workshop and Conference Proceedings . 2017,第2009期

机译：线性可解马尔可夫决策过程中的在线学习快速速率
4. Greedy Linear Value-approximation fo radtored Markov Decision Processes [C] . Relu Patrascu, Pascal Poupart, Dale Schuurmans, National Conference on Artificial Intelligence . 2002

机译：贪婪的线性值 - 近似用于Restored Markov决策过程
5. Linear approximations for factored Markov decision processes. [D] . Patrascu, Relu-Eugen. 2005

机译：因子马尔可夫决策过程的线性近似。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. A Linearly Relaxed Approximate Linear Program for Markov Decision Processes [O] . Lakshminarayanan, Chandrashekar, Bhatnagar, Shalabh, Szepesvari, Csaba 2017

机译：马尔可夫决策的线性松弛近似线性规划流程
8. Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications. [R] . Sadigh, D., Kim, E., Coogan, S., 2014

机译：基于学习的线性时序逻辑规范马尔可夫决策过程综合控制方法。

Greedy Linear Value-approximation fo radtored Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅