This paper suggests a particular form of a reward function for the fuzzy actor-critic learning Automaton (FACLA) algorithm to learn a team of pursuers how to capture a single evader. It is assumed that all the pursuers and the evader have similar speed. The FACLA algorithm with the suggested reward function formulation can be used in a decentralized manner. Each pursuer should learn how to take the right actions by tuning its fuzzy logic controller (FLC) parameters using FACLA algorithm. For the FACLA, the suggested reward function enables each pursuer to update the corresponding value function accurately. The suggested reward function depends on two factors to learn each pursuer how to participate in capturing the evader. The first depends on the difference in the line-of-sight (LOS) between each pursuer in the game and the evader at two consecutive time instant. The second factor depends on the difference between two consecutive Euclidean distance between each pursuer in the game and the evader. Simulation results are given to validate the FACLA with the suggested reward function.
展开▼