Many robotic control problems, such as autonomous helicopter flight, legged robot locomotion, and autonomous driving, remain challenging even for modern reinforcement learning algorithms. Some of the reasons for these problems being challenging are (i) It can be hard to write down, in closed form, a formal specification of the control task (for example, what is the cost function for "driving well"?), (ii) It is often difficult to learn a good model of the robot's dynamics, (iii) Even given a complete specification of the problem, it is often computationally difficult to find good closed-loop controller for a high-dimensional, stochastic, control task. However, when we are allowed to learn from a human demonstration of a task - in other words, if we are in the apprenticeship learning setting - then a number of efficient algorithms can be used to address each of these problems. To motivate the first of the problems described above, consider the setting of teaching a young adult to drive, where rather than telling the student what the cost function is for driving, it is much easier and more natural to demonstrate driving to them, and have them learn from the demonstration. In practical applications, it is also (perhaps surprisingly) common practice to manually tweak cost functions until the correct behavior is obtained. Thus, we would like to devise algorithms that can learn from a teacher's demonstration, without needing to be explicitly told the cost function. For example, can we "guess" the teacher's cost function based on the demonstration, and use that in our own learning task? Ng and Russell [8] developed a set of inverse reinforcement learning algorithms for guessing the teacher's cost function. More recently, Abbeel and Ng [1] showed that even though the teacher's "true" cost function is ambiguous and thus can never be recovered, it is nevertheless possible to recover a cost function that allows us to learn a policy that has performance comparable to the teacher, where here performance is as evaluated on the teacher's unknown (and unknowable) cost function. Thus, access to a demonstration removes the need to explicitly write down a cost function.
展开▼