Markov games are a framework which formalises n-agent reinforcement learning. For instance, Littman proposed the minimax-Q algorithm to model two-agent zero-sum problems. This paper proposes a new simple algorithm in this framework, QL_2, and compares it to several standard algorithms (Q-learning, Minimax and minimax-Q). Experiments show that QL_2 converges to optimal mixed policies, as minimax-Q, while using a surprisingly simple and cheap gradient-based updating rule.
展开▼