首页> 中文期刊>电子学报 >一种最大集合期望损失的多目标Sarsa(λ)算法

一种最大集合期望损失的多目标Sarsa(λ)算法

     

摘要

针对RoboCup这一典型的多目标强化学习问题,提出一种基于最大集合期望损失的多目标强化学习算法LRGM-Sarsa (λ)算法。该算法预估各个目标的最大集合期望损失,在平衡各个目标的前提下选择最佳联合动作以产生最优联合策略。在单个目标训练的过程中,采用基于改进MSBR误差函数的Sarsa (λ)算法,并对动作选择概率函数和步长参数进行优化,解决了强化学习在使用非线性函数泛化时,算法不稳定、不收敛的问题。将该算法应用到RoboCup射门局部策略训练中,取得了较好的效果,表明该学习算法的有效性。%For solving the multiple-goal problem in RoboCup ,a novel multiple-goal Reinforcement Learning algorithm , named LRGM-Sarsa (λ) ,is proposed .The algorithm estimates the lost reward of the greatest mass of every sub goal and trades off the long term reward of the sub goals to get a composite policy .In the single learning module ,B error function ,which is based on MSBR error function is proposed .B error function has guaranteed the convergence of the value prediction with the non-linear func-tion approximation .The probability funciton of selecting actions and the parameter αare also improved with respect to B error func-tion .This algorithm is applied to the training of shooting in Robocup 2D .The experimental results show that the proposed algorithm is more stable and converges faster .

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号