Least squares temporal difference methods: An analysis under general conditions

Yu H.

首页> 外文期刊>SIAM Journal on Control and Optimization >Least squares temporal difference methods: An analysis under general conditions

【24h】

Least squares temporal difference methods: An analysis under general conditions

机译：最小二乘时差法：一般条件下的分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD(λ), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain different from the one corresponding to the policy under evaluation. We establish for the discounted cost criterion that LSTD(λ) converges almost surely under mild, minimal conditions. We also analyze other properties of the iterates involved in the algorithm, including convergence in mean and boundedness. Our analysis draws on theories of both finite space Markov chains and weak Feller Markov chains on a topological space. Our results can be applied to other temporal difference algorithms and MDP models. As examples, we give a convergence analysis of a TD(λ) algorithm and extensions to MDP with compact state and action spaces, as well as a convergence proof of a new LSTD algorithm with state-dependent λ-parameters.

机译：我们考虑在探索性学习环境中使用最小二乘时差（LSTD）算法LSTD（λ）对有限状态和动作Markov决策过程（MDP）进行近似策略评估，其中策略成本是根据对Markov的观察来计算的与评估中的政策对应的链不同。我们为折现成本标准建立了LSTD（λ）在温和，最小的条件下几乎可以肯定地收敛。我们还分析了算法中涉及的迭代的其他属性，包括均值和有界性的收敛。我们的分析借鉴了拓扑空间上的有限空间马氏链和弱Feller马氏链的理论。我们的结果可以应用于其他时差算法和MDP模型。作为示例，我们对TD（λ）算法进行了收敛性分析，并扩展了具有紧凑状态空间和动作空间的MDP，并给出了一种新的具有状态相关性λ参数的LSTD算法的收敛性证明。

著录项

来源
《SIAM Journal on Control and Optimization》 |2012年第6期|共34页
作者
Yu H.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类运筹学;
关键词
Approximate dynamic programming; Importance sampling; Markov chains; Markov decision processes; Temporal difference methods;

机译：近似动态规划;重要抽样;马尔可夫链;马尔可夫决策过程;时差方法;

相似文献

外文文献
中文文献
专利

1. Least squares temporal difference methods: An analysis under general conditions [J] . Yu H. SIAM Journal on Control and Optimization . 2012,第6期

机译：最小二乘时差法：一般条件下的分析
2. EEG phase synchrony differences across visual perception conditions may depend on recording and analysis methods. [J] . Trujillo LT, Peterson MA, Kaszniak AW, Clinical neurophysiology . 2005,第1期

机译：跨视觉感知条件的脑电图相位同步差异可能取决于记录和分析方法。
3. Chi‑square‑based steganalysis method against modified pixel‑value differencing steganography [J] . Wen‑Bin Lin, Tai‑Hung Lai, Chao‑Lung Chou Arabian Journal for Science and Engineering . 2021,第9期

机译：基于Chi-Square的解析方法对改进像素值差异隐写术
4. Convergence of Least Squares Temporal Difference Methods Under General Conditions [C] . Huizhen Yu International Conference on Machine Learning . 2010

机译：一般条件下最小二乘时间差异方法的收敛性
5. Incremental least-squares temporal difference learning. [D] . Geramifard, Alborz. 2007

机译：增量最小二乘时差学习。
6. A rapid method for the differentiation of yeast cells grown under carbon and nitrogen-limited conditions by means of partial least squares discriminant analysis employing infrared micro-spectroscopic data of entire yeast cells [O] . Julia Kuligowski, Guillermo Quintás, Christoph Herwig, -1

机译：通过使用整个酵母细胞的红外显微数据进行偏最小二乘判别分析快速区分在碳和氮受限条件下生长的酵母细胞
7. Least Squares Temporal Difference Methods: An Analysis Under General Conditions [O] . Huizhen, Yu 2010

机译：最小二乘时间差异法：一般条件下的分析

Least squares temporal difference methods: An analysis under general conditions

摘要

著录项

相似文献

相关主题

期刊订阅