首页> 外文会议>National Conference on Artificial Intelligence >Greedy Linear Value-approximation fo radtored Markov Decision Processes
【24h】

Greedy Linear Value-approximation fo radtored Markov Decision Processes

机译:贪婪的线性值 - 近似用于Restored Markov决策过程

获取原文

摘要

Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved: How accurate are the approximations produced by linear programs? How hard is it to produce better approximations? and Where do the basis functions come from? To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximation―showing that this is an inherently hard problem. Nevertheless, we provide a branch and bound method for calculating -Bellman error and performing approximate policy iteration for general factored MDPs. These methods are more accurate than linear programming, but more expensive. We then consider linear programming itself and investigate methods for automatically constructing sets of basis functions that allow this approach to produce good approximations. The techniques we develop are guaranteed to reduce L_1 error, but can also empirically reduce Bellman error.
机译:最近的最近工作的重点是使用线性表示来实现因子马尔可夫决策过程(MDP)的近似值函数。目前的研究采用线性规划作为计算给定基础函数集的近似的有效手段,因此应对非常大的MDP来计算。但是,许多问题仍未解决:线性程序产生的近似值是多么准确?产生更好的近似值是多么努力?基础函数来自哪里?为了解决这些问题,我们首先调查最小化线性值函数近似的Bellman错误的复杂性 - 显示这是一个固有的难题。尽管如此,我们提供了计算-Bellman错误的分支和绑定方法,并对一般因子MDP执行近似政策迭代。这些方法比线性编程更准确,但更昂贵。然后,我们考虑线性编程本身并调查自动构建基础函数集的方法,允许这种方法产生良好的近似。我们开发的技术得到保证降低L_1错误,但也可以凭经验减少Bellman错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号