首页> 外文会议>Conference of the Canadian Society for Computational Studies of Intelligence >Model-Based Least-Squares Policy Evaluation

【24h】

Model-Based Least-Squares Policy Evaluation

机译：基于模型的最小二乘政策评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A popular form of policy evaluation for large Markov Decision Processes (MDPs) is the least-squares temporal differencing (TD) method. Least-squares TD methods handle large MDPs by requiring prior knowledge feature vectors which form a set of basis vectors that compress the system down to tractable levels. Model-based methods have largely been ignored in favour of model-free TD algorithms due to two perceived drawbacks: slower computation time and larger storage requirements. This paper challenges the perceived advantage of the temporal difference method over a model-based method in three distinct ways. First, it provides a new model-based approximate policy estimation method which produces solutions in a faster computation time than Boyan's least-squares TD method. Second, it introduces a new algorithm to derive basis vectors without any prior knowledge of the system. Third, we introduce an iteratively improving model-based value estimator that can run faster than standard TD methods. All algorithms require model storage but remain computationally competitive in terms of accuracy with model-free temporal differencing methods.

机译：一种流行的大型马尔可夫决策过程（MDP）的政策评估形式是最小二乘时间差异（TD）方法。最小二乘TD方法通过要求现有知识特征向量来处理大MDP，该特征向量形成一组基向量，该向量将系统压缩到易诊断。由于两个感知缺点，基于模型的方法很大程度上被忽略了，有利于无模型TD算法：计算时间较慢和更大的存储要求。本文以三种不同方式对基于模型的方法来挑战时间差制方法的感知优势。首先，它提供了一种新的基于模型的近似政策估计方法，其在比Boyan的最小二乘TD方法更快的计算时间中产生解决方案。其次，它引入了一种新算法来导出基础向量而没有任何先前的系统知识。第三，我们介绍了一个迭代改善的基于模型的值估计器，可以比标准TD方法更快地运行。所有算法都需要模型存储，但在使用无模型时间差异方法的准确性方面保持计算竞争。

著录项

来源
《Conference of the Canadian Society for Computational Studies of Intelligence 》|2003年||共11页
会议地点
作者
Fletcher Lu; Dale Schuurmans;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation [J] . VootTangkaratt, Syogo Mori, Tingting Zhao, Neural Networks: The Official Journal of the International Neural Network Society . 2014 ,第Null期

机译：最小二乘条件密度估计的基于模型的策略梯度与基于参数的探索
2. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation [J] . Syogo Mori, Voot Tangkaratt, Tingting Zhao, 電子情報通信学会技術研究報告 . 2013 ,第454期

机译：最小二乘条件密度估计的基于模型的策略梯度与基于参数的探索
3. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation [J] . Syogo Mori, Voot Tangkaratt, Tingting Zhao, 電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 . 2012 ,第454期

机译：最小二乘条件密度估计的基于模型的策略梯度和基于参数的探索
4. Model-Based Least-Squares Policy Evaluation [C] . Fletcher Lu, Dale Schuurmans Advances in Artificial Intelligence . 2003

机译：基于模型的最小二乘策略评估
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. Measurement of the internal diameter of plastic tubes from projection MR images using a model-based least-squares fit approach [O] . Yang-Sheng Tzeng, Joey Mansour, Zachary Handler, -1

机译：使用基于模型的最小二乘拟合法从投影MR图像测量塑料管的内径
7. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation [O] . Syogo Mori, Voot Tangkaratt, Tingting Zhao, 2016

机译：基于参数的最小二乘条件密度估计的基于模型的策略梯度

Model-Based Least-Squares Policy Evaluation

摘要

著录项

相似文献

相关主题

期刊订阅