Is it possible to plan at a coarse level and act at a fine level with a neural-network (NN) reinforcement-learning (RL) planner? This work presents a NN planner, used to control a simulated robot in a stochastic landmark-navigation problem, which plans at an abstract level. The controller has both reactive components, based on actor-critic RL, and planning components inspired by the Dyna-PI architecture (this roughly corresponds to RL plus a model of the environment). Coarse planning is based on macro-actions defined as a sequence of identical primitive actions. It updates the evaluations and the action policy while generating simulated experience at the macro level with the model of the environment (a NN trained at the macro level). The simulations show how the controller works. They also show the advantages of using a discount coefficient tuned to the level of planning coarseness, and suggest that discounted RL has problems in dealing with long periods of time.
展开▼