Q-Learning has gained increasing attention as a promising real time learning scheme from delayed reinforcement. Being compact, model free and theoretically optimal it is commonly preferred to AHC-Learning and its derivatives. However, it has long been noticed that theoritical optimality has to be sacrificed in order to meet the constraints of most applications. In this article we report of experiments with modified Q-Learning algorithms together with their key ingredients for practical success in reinforcement learning. These include optimistic initialization, the principle of piecewise constancy of policy and the use of activity traces. Finally, we extend these algorithms for growing RBF networks with additional on-line learning vector quantization (adaptive perceptualization) and obtain very encouraging results as well. Our test bed is pole balancing with additional noise on the sensory input.
展开▼