首页> 外文会议>International Conference on Algorithmic Learning Theory >Reinforcement Learning and Apprenticeship Learning for Robotic Control
【24h】

Reinforcement Learning and Apprenticeship Learning for Robotic Control

机译:机器人控制的强化学习和学徒学习

获取原文
获取外文期刊封面目录资料

摘要

Many robotic control problems, such as autonomous helicopter flight, legged robot locomotion, and autonomous driving, remain challenging even for modern reinforcement learning algorithms. Some of the reasons for these problems being challenging are (i) It can be hard to write down, in closed form, a formal specification of the control task (for example, what is the cost function for "driving well"?), (ii) It is often difficult to learn a good model of the robot's dynamics, (iii) Even given a complete specification of the problem, it is often computationally difficult to find good closed-loop controller for a high-dimensional, stochastic, control task. However, when we are allowed to learn from a human demonstration of a task - in other words, if we are in the apprenticeship learning setting - then a number of efficient algorithms can be used to address each of these problems. To motivate the first of the problems described above, consider the setting of teaching a young adult to drive, where rather than telling the student what the cost function is for driving, it is much easier and more natural to demonstrate driving to them, and have them learn from the demonstration. In practical applications, it is also (perhaps surprisingly) common practice to manually tweak cost functions until the correct behavior is obtained. Thus, we would like to devise algorithms that can learn from a teacher's demonstration, without needing to be explicitly told the cost function. For example, can we "guess" the teacher's cost function based on the demonstration, and use that in our own learning task? Ng and Russell [8] developed a set of inverse reinforcement learning algorithms for guessing the teacher's cost function. More recently, Abbeel and Ng [1] showed that even though the teacher's "true" cost function is ambiguous and thus can never be recovered, it is nevertheless possible to recover a cost function that allows us to learn a policy that has performance comparable to the teacher, where here performance is as evaluated on the teacher's unknown (and unknowable) cost function. Thus, access to a demonstration removes the need to explicitly write down a cost function.
机译:许多机器人控制问题,如自主直升机飞行,腿机器人机器人和自主驾驶,即使对于现代加固学习算法而言,仍然挑战。这些问题的一些原因是具有挑战性的(i)它可能很难在封闭形式下写下控制任务的正式规范(例如,“驾驶良好的成本函数”?),( ii)往往难以学习机器人动态的好模型,(iii)甚至给出了一个完整的问题规范,通常难以找到高维,随机控制任务的良好闭环控制器。但是,当我们被允许从人类演示中学习任务 - 换句话说,如果我们在学徒学习设置中,那么可以使用许多有效的算法来解决这些问题中的每一个问题。为了激励上述第一个问题,考虑教授一个年轻人来开车的设置,而不是告诉学生的成本职能是为了驾驶,而且展示向他们开车更容易和更自然,并且有他们从演示中学习。在实际应用中,它也(也许令人惊讶地)常规做法,以便在获得正确的行为之前手动调整成本函数。因此,我们希望设计能够从教师演示中学习的算法,而无需明确地告诉成本函数。例如,我们可以根据演示,“猜测”教师的成本函数,并在我们自己的学习任务中使用这一点? NG和Russell [8]开发了一系列反增强学习算法,用于猜测教师的成本函数。最近,ABBEEL和NG [1]表明,即使教师的“真实”成本函数是模糊的,因此不能恢复,因此可以恢复允许我们学习具有可比性的策略的成本函数老师,这里的表现在于教师未知(和不可知)成本函数的评估。因此,访问演示删除了明确地编写成本函数的需要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号