【24h】

Model-Based Least-Squares Policy Evaluation

机译:基于模型的最小二乘政策评估

获取原文

摘要

A popular form of policy evaluation for large Markov Decision Processes (MDPs) is the least-squares temporal differencing (TD) method. Least-squares TD methods handle large MDPs by requiring prior knowledge feature vectors which form a set of basis vectors that compress the system down to tractable levels. Model-based methods have largely been ignored in favour of model-free TD algorithms due to two perceived drawbacks: slower computation time and larger storage requirements. This paper challenges the perceived advantage of the temporal difference method over a model-based method in three distinct ways. First, it provides a new model-based approximate policy estimation method which produces solutions in a faster computation time than Boyan's least-squares TD method. Second, it introduces a new algorithm to derive basis vectors without any prior knowledge of the system. Third, we introduce an iteratively improving model-based value estimator that can run faster than standard TD methods. All algorithms require model storage but remain computationally competitive in terms of accuracy with model-free temporal differencing methods.
机译:一种流行的大型马尔可夫决策过程(MDP)的政策评估形式是最小二乘时间差异(TD)方法。最小二乘TD方法通过要求现有知识特征向量来处理大MDP,该特征向量形成一组基向量,该向量将系统压缩到易诊断。由于两个感知缺点,基于模型的方法很大程度上被忽略了,有利于无模型TD算法:计算时间较慢和更大的存储要求。本文以三种不同方式对基于模型的方法来挑战时间差制方法的感知优势。首先,它提供了一种新的基于模型的近似政策估计方法,其在比Boyan的最小二乘TD方法更快的计算时间中产生解决方案。其次,它引入了一种新算法来导出基础向量而没有任何先前的系统知识。第三,我们介绍了一个迭代改善的基于模型的值估计器,可以比标准TD方法更快地运行。所有算法都需要模型存储,但在使用无模型时间差异方法的准确性方面保持计算竞争。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号