首页> 外文会议>Machine learning(ML95) >Residual Algorithms: Reinforcement Learning with Function Approximation
【24h】

Residual Algorithms: Reinforcement Learning with Function Approximation

机译:残差算法:具有函数逼近的强化学习

获取原文
获取原文并翻译 | 示例

摘要

A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general function-approximation system, such as a sigmoidal multilayer perceptron, a radial-basis-function system, a memory-based learning system, or even a linear function-approximation system. A new class of algorithms, residual gradient algorithms, is proposed, which perform gradient descent on the mean squared Bellman residual, guaranteeing convergence. It is shown, however, that they may learn very slowly in some cases. A larger class of algorithms, residual algorithms, is proposed that has the guaranteed convergence of the residual gradient algorithms, yet can retain the fast learning speed of direct algorithms. In fact, both direct and residual gradient algorithms are shown to be special cases of residual algorithms, and it is shown that residual algorithms can combine the advantages of each approach. The direct, residual gradient, and residual forms of value iteration, Qlearning, and advantage learning are all presented. Theoretical analysis is given explaining the properties these algorithms have, and simulation results are given that demonstrate these properties.
机译:已经开发了许多强化学习算法,这些算法在与查找表一起使用时可以保证收敛到最佳解决方案。然而,事实表明,当这些算法直接用通用函数逼近系统(例如S型多层感知器,径向基函数系统,基于内存的学习系统,甚至线性算法)直接实现时,很容易变得不稳定。函数逼近系统。提出了一种新的算法,即残差梯度算法,该算法对均方贝尔曼残差进行梯度下降,从而保证了收敛性。但是,事实表明,他们在某些情况下可能会学习得很慢。提出了一种更大的算法,即残差算法,该算法具有残差梯度算法的保证收敛性,但可以保留直接算法的快速学习速度。实际上,直接梯度算法和残差梯度算法都被证明是残差算法的特例,并且表明残差算法可以结合每种方法的优点。给出了值迭代,Qlearning和优势学习的直接,残差梯度和残差形式。给出了理论分析以解释这些算法具有的特性,并给出了证明这些特性的仿真结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号