首页> 外文会议>IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning >A two stage learning technique for dual learning in the pursuit-evasion differential game
【24h】

A two stage learning technique for dual learning in the pursuit-evasion differential game

机译:追逃微分游戏中双重学习的两阶段学习技术

获取原文

摘要

This paper addresses the case of dual learning in the pursuit-evasion (PE) differential game and examines how fast the players can learn their default control strategies. The players should learn their default control strategies simultaneously by interacting with each other. Each player's learning process depends on the rewards received from its environment. The learning process is implemented using a two stage learning algorithm that combines the particle swarm optimization (PSO)-based fuzzy logic control (FLC) algorithm with the Q-Learning fuzzy inference system (QFIS) algorithm. The PSO algorithm is used as a global optimizer to autonomously tune the parameters of a fuzzy logic controller whereas the QFIS algorithm is used as a local optimizer. The two stage learning algorithm is compared through simulation with the default control strategy, the PSO-based FLC algorithm, and the QFIS algorithm. Simulation results show that the players are able to learn their default control strategies. Also, it shows that the two stage learning algorithm outperforms the PSO-based FLC algorithm and the QFIS algorithm with respect to the learning time.
机译:本文讨论了追逃(PE)差分游戏中的双重学习情况,并研究了玩家可以多快地学习其默认控制策略。玩家应通过相互交互同时学习其默认控制策略。每个玩家的学习过程取决于从其环境中获得的回报。使用两阶段学习算法实现学习过程,该算法将基于粒子群优化(PSO)的模糊逻辑控制(FLC)算法与Q-Learning模糊推理系统(QFIS)算法结合在一起。 PSO算法用作全局优化器,以自动调整模糊逻辑控制器的参数,而QFIS算法用作局部优化器。通过仿真将默认控制策略,基于PSO的FLC算法和QFIS算法与两阶段学习算法进行比较。仿真结果表明,玩家能够学习他们的默认控制策略。而且,它表明在学习时间方面,两阶段学习算法优于基于PSO的FLC算法和QFIS算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号