首页> 外文会议>International Automatic Control Conference >Fuzzy reinforcement learning algorithm for the pursuit-evasion differential games with superior evader
【24h】

Fuzzy reinforcement learning algorithm for the pursuit-evasion differential games with superior evader

机译:具有超躲避能力的追逃微分游戏的模糊强化学习算法

获取原文

摘要

This paper proposes a fuzzy reinforcement learning technique that enables a group of pursuers in pursuit-evasion (PE) differential games to learn how to capture a single superior evader in a decentralized manner. The superiority of the evader is in term of its maximum speed which means that this speed exceeds the maximum speed of the fastest pursuer in the game. The proposed learning technique uses a fuzzy actor-critic learning Automaton (FACLA) algorithm together with the so-called Apollonius circle technique and a specific formation control strategy which are used to define the necessary reward function for each pursuer. This enables each pursuer to update its value function accurately. Accordingly, the pursuer will take the right actions by tuning its fuzzy logic controller (FLC) parameters. The formation control strategy is also used such that during the capturing process the distribution angles of the pursuers around the evader are invariant as much as possible. Furthermore, it is also used to avoid a collision among them. It is assumed that the superior evader is an intelligent evader whose strategy is to continuously search for a gap during the evasion process by using the Apollonius circle method. If there is a gap, the evader will select its path through the gap to escape otherwise the evader will change its direction to increase the capture time. Simulation results are given to validate the proposed learning algorithm.
机译:本文提出了一种模糊强化学习技术,该技术可以使一群追逃(PE)差分游戏中的追随者学习如何以分散的方式捕获单个上级逃避者。逃避者的优势在于其最大速度,这意味着该速度超过了游戏中最快追逐者的最大速度。所提出的学习技术使用模糊的行为者批判学习自动机(FACLA)算法以及所谓的Apollonius圆技术和特定的编队控制策略,这些策略用于为每个追求者定义必要的奖励函数。这使每个追求者都能准确地更新其价值功能。因此,追踪者将通过调整其模糊逻辑控制器(FLC)参数来采取正确的措施。还使用编队控制策略,以便在捕获过程中,追击者在躲避者周围的分布角度尽可能不变。此外,还用于避免它们之间的冲突。假设上级规避者是一种智能规避者,其策略是在逃避过程中使用Apollonius圆法不断寻找空位。如果存在间隙,逃避者将选择通过该间隙的路径以逃逸,否则逃避者将改变其方向以增加捕获时间。仿真结果验证了所提出的学习算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号