首页> 外文期刊>Neurocomputing >Principled reward shaping for reinforcement learning via lyapunov stability theory
【24h】

Principled reward shaping for reinforcement learning via lyapunov stability theory

机译:利用Lyapunov稳定性理论对强化学习的原则奖励塑造

获取原文
获取原文并翻译 | 示例
           

摘要

Reinforcement learning (RL) suffers from the designation in reward function and the large computational iterating steps until convergence. How to accelerate the training process in RL plays a vital role. In this paper, we proposed a Lyapunov function based approach to shape the reward function which can effectively accelerate the training. Furthermore, the shaped reward function leads to convergence guarantee via stochastic approximation, an invariant optimality condition using Bellman Equation and an asymptotical unbiased policy. Moreover, sufficient RL benchmarks have been experimented to demonstrate the effectiveness of our proposed method. It has been verified that our proposed method substantially accelerates the convergence process as well as improves the performance in terms of a higher accumulated reward. (C) 2020 Elsevier B.V. All rights reserved.
机译:强化学习(RL)遭受奖励功能的名称和大型计算迭代步骤,直到收敛。如何加速RL中的培训过程起到重要作用。在本文中,我们提出了基于Lyapunov功能的方法来塑造奖励功能,可以有效地加速训练。此外,通过随机近似,使用钟声方程和渐近无偏的政策导致通过随机近似的收敛保证,导致收敛保证。此外,已经尝试了足够的RL基准以证明我们提出的方法的有效性。已经证实我们的提出方法大大加速了收敛过程,并提高了较高累计奖励的性能。 (c)2020 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2020年第14期|83-90|共8页
  • 作者单位

    Huazhong Univ Sci & Technol Sch Artificial Intelligence & Automat Key Lab Imaging Proc & Intelligent Control State Key Lab Digital Mfg Equipments & Technol Wuhan 430074 Peoples R China;

    Huazhong Univ Sci & Technol Sch Artificial Intelligence & Automat Key Lab Imaging Proc & Intelligent Control State Key Lab Digital Mfg Equipments & Technol Wuhan 430074 Peoples R China;

    Huazhong Univ Sci & Technol Sch Artificial Intelligence & Automat Key Lab Imaging Proc & Intelligent Control State Key Lab Digital Mfg Equipments & Technol Wuhan 430074 Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Reinforcement learning; Principled reward shaping; Lyapunov stability theory; Stochastic approximation;

    机译:加强学习;原则奖励塑造;Lyapunov稳定性理论;随机近似;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号