首页> 外文会议>IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning >A two stage learning technique for dual learning in the pursuit-evasion differential game
【24h】

A two stage learning technique for dual learning in the pursuit-evasion differential game

机译:追求逃避差异游戏中的双重学习两级学习技术

获取原文

摘要

This paper addresses the case of dual learning in the pursuit-evasion (PE) differential game and examines how fast the players can learn their default control strategies. The players should learn their default control strategies simultaneously by interacting with each other. Each player's learning process depends on the rewards received from its environment. The learning process is implemented using a two stage learning algorithm that combines the particle swarm optimization (PSO)-based fuzzy logic control (FLC) algorithm with the Q-Learning fuzzy inference system (QFIS) algorithm. The PSO algorithm is used as a global optimizer to autonomously tune the parameters of a fuzzy logic controller whereas the QFIS algorithm is used as a local optimizer. The two stage learning algorithm is compared through simulation with the default control strategy, the PSO-based FLC algorithm, and the QFIS algorithm. Simulation results show that the players are able to learn their default control strategies. Also, it shows that the two stage learning algorithm outperforms the PSO-based FLC algorithm and the QFIS algorithm with respect to the learning time.
机译:本文在追求逃避(PE)差异游戏中,对双重学习进行了解决,并检查玩家可以学习其默认控制策略的快速。玩家应该通过互相交互同时学习其默认控制策略。每个玩家的学习过程都取决于从其环境收到的奖励。使用两阶段学习算法来实现学习过程,该算法将粒子群优化(PSO)的模糊逻辑控制(FLC)算法与Q学习模糊推理系统(QFIS)算法组合。 PSO算法用作全局优化器,以自主调整模糊逻辑控制器的参数,而QFIS算法用作本地优化器。通过使用默认控制策略,基于PSO的FLC算法和QFIS算法进行仿真比较了两个阶段学习算法。仿真结果表明,玩家能够学习其默认控制策略。此外,它表明,两个阶段学习算法优于基于PSO的FLC算法和QFIS算法的学习时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号