Complicated tasks are often difficult to be expressed as single reward systems. In the human learning process, the relation between sensory inputs and action out-puts can be understood to have been acquired before-hand using an internal multidimensional reward system. We introduce reinforcement learning under multidimensional evaluation. The internal reward system includes both immediate evaluation and delayed rewards. The proposed architecture of the learning system is as a two layered Q-Learning system, which is combined with dynamic cell structure. We assume in the pushing task by a manipulator that information from touch sensors and motion detector of the vision system are available. The simulation showed that the acquired knowledge in the lower layer greatly helps to learn the pushing task.
展开▼