We present a hybrid Adaptive Heuristic Critic (AHC) architecture which learns an internal model of a maze environment through interaction with it. The adaptive critic's model is based around a radial basis function (RBF) neural network. Over successive trials the V-function is learned, a mapping between positions in the maze and their value. The model is based upon continuous valued spacial inputs and possesses the useful feature of "local generalisation" about the value associated with the region surrounding a position in the maze. An action policy allowing straight line movements to anywhere in the maze in a single step is adopted. This policy is implemented using a genetic algorithm (GA) which searches for an optimum movement at each time step. Although for computational convenience the GA is still based upon a discretized search of the maze-space the architecture should generalise well to evolutionary algorithms more suited to searching continuous spaces, allowing the concept of a discrete state to be dispensed with altogether.
展开▼