This paper proposes an adaptive module acquisition for the modular reinforcement learning, where a learning agent starts with fundamental modules and acquires new modules during the learning if necessary. This relaxes the problem that it is difficult to know suitable module structure to accomplish the task in advance without a-priori knowledge of the problem. The criterion to introduce new modules is derived from fundamental aspect of reinforcement learning that the probability of the situation that values of states increase along the greedy policy becomes high after sufficient learning. The proposed method is implemented on Q-learning. It is applied to so-called "pursuit problem" simulated in a computer where two learning agents are navigated to catch a randomly moving object. As a result of computer simulations, the proposed method shows fairly good result in terms of better or the same performance with the less number of states compared to normal Q-learning or modular Q-learning without capability of acquiring new modules.
展开▼