Incremental least-squares temporal difference learning.

机译：增量最小二乘时差学习。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sequential decision making is a challenging problem for the artificial intelligence community. It can be modeled as an agent interacting with an environment according to its policy. Policy iteration methods are a popular approach involving the interleaving of two stages: policy evaluation to compute the desirability of each state with respect to the policy, and policy improvement to improve the current policy with respect to the state values. The effectiveness of this approach is highly dependent on the effectiveness of policy evaluation, which is the focus of this dissertation. The per time step complexity of traditional methods like temporal difference learning (TD) are sublinear in the number of features. Thus, they can be scaled to large environments, however they use training data relatively inefficiently and so require a large number of sample interactions. The least-squares TD (LSTD) method addresses the data inefficiency of TD by using the sum of the TD updates on all past experiences. This makes LSTD a formidable algorithm for tackling problems where data is limited or expensive to gather. However, the computational cost of LSTD cripples its applicability in most large environments. We introduce an incremental version of the LSTD method, called iLSTD, for online policy evaluation in large problems.;On each time step, iLSTD uses the sum TD update vector in a gradient fashion by selecting and descending in a limited set of dimensions. We show that if a sparse feature representation is being used, the iLSTD algorithm's per time step complexity is linear in the number of features whereas for LSTD, it is quadratic. This allows iLSTD to scale up to large environments with many features where LSTD cannot be applied. On the other hand, because iLSTD takes advantage of all data on each time step, it requires far less data than the TD method. Empirical results in the Boyan chain and mountain car environments shows the superiority of iLSTD with respect to TD and the speed advantage of iLSTD with respect to LSTD. We also extend iLSTD with eligibility traces, resulting in iLSTD(lambda), and show that the additional computation does not change the linear per time step complexity. Additionally, we investigate the performance and convergence properties of iLSTD with different dimension selection mechanisms. Finally, we discuss the limitations of this study.

机译：对于人工智能界来说，顺序决策是一个具有挑战性的问题。可以将其建模为根据其策略与环境交互的代理。策略迭代方法是一种流行的方法，涉及两个阶段的交织：策略评估以计算每个状态相对于策略的可取性，以及策略改进以相对于状态值改进当前策略。这种方法的有效性在很大程度上取决于政策评估的有效性，这是本文的重点。传统方法（如时差学习（TD））的每时间步复杂度在特征数量上是次线性的。因此，它们可以扩展到大型环境，但是它们使用培训数据的效率相对较低，因此需要大量的样本交互。最小二乘TD（LSTD）方法通过使用所有过去经验中TD更新的总和来解决TD的数据效率低下的问题。这使得LSTD成为解决数据量有限或收集成本高的问题的强大算法。但是，LSTD的计算成本削弱了它在大多数大型环境中的适用性。我们引入了LSTD方法的增量版本，称为iLSTD，用于在较大问题中进行在线策略评估。;在每个时间步上，iLSTD都会通过选择和降级一组有限的维度来以梯度方式使用TD总和更新向量。我们表明，如果使用稀疏特征表示，则iLSTD算法的每时间步复杂度在特征数量上是线性的，而对于LSTD，它是二次的。这使iLSTD可以扩展到具有许多无法应用LSTD的功能的大型环境。另一方面，由于iLSTD在每个时间步都利用了所有数据，因此它需要的数据要比TD方法少得多。在博扬链和山地车环境中的经验结果表明，iLSTD相对于TD的优越性以及iLSTD相对于LSTD的速度优势。我们还使用资格跟踪对iLSTD进行了扩展，得到了iLSTD（lambda），并显示了额外的计算不会改变每时间步长复杂度的线性关系。此外，我们研究了具有不同维度选择机制的iLSTD的性能和收敛特性。最后，我们讨论了这项研究的局限性。

著录项

作者
Geramifard, Alborz.;
展开▼
作者单位

University of Alberta (Canada).;

展开▼
授予单位 University of Alberta (Canada).;
学科 Computer science.
学位 M.Sc.
年度 2007
页码 63 p.
总页数 63
原文格式 PDF
正文语种 eng
中图分类老年病学;
关键词

相似文献

外文文献
中文文献
专利

1. Temporal dynamics of age-related differences in auditory incidental verbal learning. [J] . Aine CJ, Adair JC, Knoefel JE, Brain research. Cognitive brain research . 2005,第1期

机译：与年龄有关的听觉附带言语学习差异的时间动态。
2. Frontal operculum temporal difference signals and social motor response learning. [J] . Kumar P, Waiter G, Ahearn T, Human brain mapping . 2009,第5期

机译：额颞部差异信号和社交运动反应学习。
3. Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator [J] . Stephen Tu, Benjamin Recht JMLR: Workshop and Conference Proceedings . 2018,第3期

机译：线性二次调节器的最小二乘时间差学习
4. Actor-Critic Algorithm Based on Incremental Least-Squares Temporal Difference with Eligibility Trace [C] . Yuhu Cheng, Huanting Feng, Xuesong Wang ICIC 2011;International conference on intelligent computing . 2012

机译：基于增量最小二乘时间差和合格迹的Actor-Critic算法
5. Least-squares mapping from kinematic data to acoustic synthesis parameters for rehabilitative acoustic learning. [D] . Zhou, Xiangyu. 2016

机译：从运动学数据到声学合成参数的最小二乘映射，以恢复声学学习。
6. Kernel Recursive Least-Squares Temporal Difference Algorithms with Sparsification and Regularization [O] . Chunyuan Zhang, Qingxin Zhu, Xinzheng Niu 2016

机译：具有稀疏化和正则化的内核递归最小二乘时间差分算法
7. Statistically linearized least-squares temporal differences [O] . Geist Matthieu, Pietquin Olivier 2010

机译：统计线性最小二乘法时间差异

Incremental least-squares temporal difference learning.

摘要

著录项

相似文献

相关主题

期刊订阅