Abstract. In this paper, we discuss Profit-sharing, an experience-based reinforcement learning approach (which is similar to a Monte-Carlo based reinforcement learning method) that can be used to learn robust and effective actions within uncertain, dynamic, multi-agent systems. We introduce the cut-loop routine that discards looping behavior, and demonstrate its effectiveness empirically within a simplified NEO (non-combatant evacuation operation) domain. This domain consists of several agents which ferry groups of evacuees to one of several shelters. We demonstrate that the cut-loop routine makes the Profit-sharing approach adaptive and robust within a dynamic and uncertain domain, without the need for pre-defined knowledge or subgoals. We also compare it empirically with the popular Q-learning approach.
展开▼