首页> 外文会议>European conference on machine learning and knowledge discovery in databases >Learning from Demonstrations: Is It Worth Estimating a Reward Function?
【24h】

Learning from Demonstrations: Is It Worth Estimating a Reward Function?

机译:从示范中学习:值得评估奖励功能吗?

获取原文

摘要

This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations of an expert. In the AL framework, the agent tries to learn the expert policy whereas in the IRL framework, the agent tries to learn a reward which can explain the behavior of the expert. This reward is then optimized to imitate the expert. One can wonder if it is worth estimating such a reward, or if estimating a policy is sufficient. This quite natural question has not really been addressed in the literature right now. We provide partial answers, both from a theoretical and empirical point of view.
机译:本文提供了反强化学习(IRL)和学徒学习(AL)之间的比较研究。 IRL和AL是使用Markov决策过程(MDP)的两个框架,用于解决代理人试图从专家的演示中学习的模仿学习问题。在AL框架中,代理尝试学习专家策略,而在IRL框架中,代理尝试学习可以解释专家行为的奖励。然后优化此奖励以模仿专家。有人会怀疑是否值得估计这样的奖励,或者估计一项政策是否足够。目前,文献中还没有真正解决这个非常自然的问题。从理论和经验的角度来看,我们都提供部分答案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号