...
首页> 外文期刊>Intelligent automation and soft computing >A GRADIENT DESCENT SARSA(λ) ALGORITHM BASED ON THE ADAPTIVE REWARD-SHAPING MECHANISM
【24h】

A GRADIENT DESCENT SARSA(λ) ALGORITHM BASED ON THE ADAPTIVE REWARD-SHAPING MECHANISM

机译:基于自适应奖励成形机制的梯度下降SARSA(λ)算法

获取原文
获取原文并翻译 | 示例

摘要

Based on the adaptive reward-shaping mechanism, we propose a novel gradient descent (GD) Sarsa(λ) algorithm to solve the problems of ill initial performance and low convergence speed in the reinforcement learning tasks with continuous state space. Adaptive normalized radial basis function (ANRBF) network is used to shape reward. The reward-shaping mechanism propagates model knowledge to the learner in the form of the additional reward signal so that the initial performance and convergence speed can be improved effectively. A function approximation algorithm named ANRBF-GD-Sarsa(λ) is proposed based on the ANRBF network. The convergence of ANRBF-GD-Sarsa(λ) is analyzed theoretically. Experiments are conducted to show the good initial performance and high convergence speed of the proposed algorithm.
机译:基于自适应奖励成形机制,我们提出了一种新颖的梯度下降(GD)Sarsa(λ)算法,以解决连续状态空间下的强化学习任务中初始性能差和收敛速度低的问题。自适应归一化径向基函数(ANRBF)网络用于塑造奖励。奖励塑造机制以附加奖励信号的形式将模型知识传播给学习者,从而可以有效地提高初始性能和收敛速度。提出了一种基于ANRBF网络的函数逼近算法ANRBF-GD-Sarsa(λ)。从理论上分析了ANRBF-GD-Sarsa(λ)的收敛性。实验表明,该算法具有良好的初始性能和较高的收敛速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号