首页> 外文会议>Annual IEEE International Systems Conference >Decentralized Learning in Pursuit-Evasion Differential Games with Multi-Pursuer and Single-Superior Evader
【24h】

Decentralized Learning in Pursuit-Evasion Differential Games with Multi-Pursuer and Single-Superior Evader

机译:与多追求和单追求的追求差动游戏分散学习

获取原文

摘要

In this paper, we consider a multi-pursuer single-superior-evader pursuit-evasion differential game where the speed of the evader is similar to the speed of each pursuer. A new fuzzy reinforcement learning algorithm is proposed in this work for this game. Each pursuer of the game uses the proposed algorithm to learn its control strategy. The proposed algorithm of each pursuer uses the residual gradient fuzzy actor critic learning (RGFACL) algorithm to tune the parameters of the fuzzy logic controller (FLC) of the pursuer. The proposed algorithm uses a formation control approach in the tuning mechanism of the FLC of the learning pursuer so that the learning pursuer or the other learning pursuers can capture the superior evader. The formation control mechanism used by the proposed algorithm guarantees that the pursuers are distributed around the superior evader in order to avoid collision between pursuers. The formation control mechanism also guarantees that the capture regions of each two adjacent pursuers overlap or at least border each other so that the capture of the superior evader will be guaranteed. The proposed algorithm is a decentralized algorithm as no communication among pursuers is required. The only information that the proposed algorithm of each learning pursuer requires is the position and the speed of the superior evader. The proposed algorithm is used to learn a multi-pursuer single-superior-evader pursuit-evasion differential game. The simulation results show the effectiveness of the proposed algorithm as the superior evader is always captured by one or some of the pursuers learning their strategies by the proposed algorithm.
机译:在本文中,我们考虑了一种多追求的单级超级逃避追求差分游戏,逃避者的速度类似于每个追捕者的速度。在这场比赛中提出了一种新的模糊钢筋学习算法。每个追求者的游戏都使用所提出的算法来学习其控制策略。每个追求者的所提出的算法使用残余梯度模糊演员批评批评(RGFACL)算法调整追捕者模糊逻辑控制器(FLC)的参数。所提出的算法在学习追求者的FLC调整机制中使用形成控制方法,以便学习追求或其他学习追求者可以捕捉上级避难者。所提出的算法使用的形成控制机制保证了追求者在优越的避难者周围分布,以避免追求者之间的碰撞。形成控制机制还保证每两个相邻追求者的捕获区域重叠或至少彼此的边界,以便保证捕获优越的避难所。该算法是一种分散的算法,因为需要追求追踪之间的通信。唯一的信息追索者所需的算法唯一的信息是上级避难者的位置和速度。该算法用于学习多追踪单高级Evader追踪差分游戏。仿真结果表明,所提出的算法的有效性,因为卓越的避难者始终被一个或一些追求者通过所提出的算法学习其策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号