Machine learning technology helps multi-robot systems to carry out desired tasks in an unknown dynamic environment. In this paper, we extend the single-agent Q-learning algorithm to a multi-robot box-pushing system in an unknown dynamic environment with random obstacle distribution. There are two kinds of extensions available: directly extending MDP (Markov Decision Process) based Q-learning to the multi-robot domain, and SG-based (Stochastic Game based) Q-learning. Here, we select the first kind of extension because of its simplicity. The learning space, the box dynamics, and the reward function etc. are presented in this paper. Furthermore, a simulation system is developed and its results show effectiveness, robustness and adaptivity of this learning-based multi-robot system. Our statistical analysis of the results also shows that the robots learned correct cooperative strategy even in a dynamic environment.
展开▼