We present three principles of hierarchical task composition within a single agent using reinforcement learning to solve continuous control problems. We consider complex tasks having goals defined as conjunctions of subgoals, each learned by a separate task with Q-learning. However, subgoals may depend on each other, requiring particular task composition principles. In the first principle, the Q-function of some task gets underlaid with the Q-function of an avoidance task, resulting in a composition in which the latter may put a veto on an action of the former. The second principle uses explicit task activation as a hierarchical relation between two tasks. Subtask activation lasts just one time-step the length of which is adapted to the particular subtask's state-space discretization. In the third principle, two tasks are related to each other such that the hierarchically higher one perturbs the goal state of the lower one in the direction of its own goal. These principles define interaction in a multi-layer architecture, with sequential task composition within each layer, and with each maintaining the system in an equilibrium condition. The approach is demonstrated with the task in which a truck navigates backwards to a docking point.
展开▼