We introduce a novel reinforcement learning method for multiagent systems called N-learning. It has been developed to deal with the state space explosion caused by the presence of additional agents in an environment. N-learning is applied to a pursuit-evasion problem where a pursuer aims to calculate optimal policies for the interception of a deterministically moving evader, using an action selection component that can be realised through a number of techniques, and a heuristic reinforcement learning reward function. It is demonstrated that N-learning is able to outperform Q-learning at the pursuit-evasion task.
展开▼