This paper proposes an extension of reinforcement learning that let each robot learn conflict-free strategy and that avoids state explosion problem. The key idea is to divide a state-action learner in a robot into a set of some discrete learning units, and let them compete with each other so that the task differentiation would easily be achieved. In the proposing architecture, the robots decide an action by choosing internal learner. The standard of selecting an internal agent is the utility vector. We applied this architecture to computer simulations of a seesaw balancing problem, and let the robots adjust the utility vector to differentiate behavior with each other.
展开▼