An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

Ajin George Joseph; Shalabh Bhatnagar

首页> 外文期刊>Machine Learning >An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

【24h】

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

机译：交叉熵法线性函数逼近的强化学习在线预测算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, i.e., estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability.

机译：在本文中，我们针对强化学习中的预测问题提供了两种新的稳定的在线算法，即使用线性函数逼近架构估算无模型马尔可夫奖励过程的值函数，并在内存和计算成本中按比例缩放功能集的大小。该算法采用了非常流行的交叉熵优化方法的多时间尺度随机近似变量，该方法是一种基于模型的搜索方法，用于找到实值函数的全局最优值。提供了使用ODE方法进行算法收敛的证明。我们通过实验比较来补充理论结果。该算法在许多RL基准问题上，在计算效率，准确性和稳定性方面都相当稳定地取得了良好的性能。

著录项

来源
《Machine Learning》 |2018年第10期|1385-1429|共45页
作者
Ajin George Joseph; Shalabh Bhatnagar;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Reinforcement learning algorithms with function approximation: Recent advances and applications [J] . Xin Xu, Lei Zuo, Zhenhua Huang Information Sciences: An International Journal . 2014,第Null期

机译：具有函数逼近的强化学习算法：最新进展和应用
2. Fault-Tolerant Controller Design for a Class of Nonlinear MIMO Discrete-Time Systems via Online Reinforcement Learning Algorithm [J] . Z. Wang, L. Liu, H. Zhang, IEEE Transactions on Systems, Man, and Cybernetics . 2016,第5期

机译：基于在线强化学习算法的一类非线性MIMO离散时间系统的容错控制器设计
3. Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems [J] . Yasini Sholeh, Karimpour Ali, Sistani Mohammad-Bagher Naghibi, International Journal of Adaptive Control and Signal Processing . 2015,第4期

机译：在线并发强化学习算法，用于求解部分未知的非线性连续时间系统的两人零和游戏
4. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation [C] . Ralf Schoknecht Annual neural information processing systems conference . 2003

机译：线性函数近似加强学习算法的最优性
5. Value Function Approximation Algorithms for Reinforcement Learning in Delay-Sensitive Wireless Communications [D] . Sharma, Nikhilesh. 2020

机译：延迟敏感无线通信中增强学习的价值函数近似算法
6. MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning [O] . Diego Granziol, Binxin Ru, Stefan Zohren, 2019

机译：MEME：大型机器学习中有效近似的准确最大熵方法
7. Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations [O] . McMahan, H. Brendan, Orabona, Francesco 2014

机译：Hilbert空间中的无约束在线线性学习：minimax 算法和正态近似

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅