This paper proposes a modular reinforcement learning with adaptive module acquisition, where a learning agent starts with states assigned to fundamental modules only and acquires new modules during the learning if necessary. This relaxes the difficulty of designing suitable module structure to accomplish the task in advance without a-priori knowledge of the problem. The criterion to introduce new states is derived from a fundamental characteristic of the reinforcement learning, i.e. state values gradually increase along the greedy policy after sufficient learning with a suitable state space. The proposed method is implemented on Q-learning. It is applied to so-called "pursuit problem" simulated in a computer where two learning agents are navigated to catch a randomly moving object and is also applied to a problem navigating single agent in a simple maze. As a result of computer simulations, the proposed method shows fairly good performance with the less number of states compared to conventional Q-learning and the modular Q-learning without capability of acquiring new modules.
展开▼