首页> 外文会议>International Conference on Autonomous Agents and Multiagent Systems >Boosted and Reward-regularized Classification for Apprenticeship Learning
【24h】

Boosted and Reward-regularized Classification for Apprenticeship Learning

机译:提升和奖励定期的学徒学习分类

获取原文

摘要

This paper deals with the problem of learning from demonstrations, where an agent called the apprentice tries to learn a behavior from demonstrations of another agent called the expert. To address this problem, we place ourselves into the Markov Decision Process (MDP) framework, which is well suited for sequential decision making problems. A way to tackle this problem is to reduce it to classification but doing so we do not take into account the MDP structure. Other methods which take into account the MDP structure need to solve MDPs which is a difficult task and/or need a choice of features which is problem-dependent. The main contribution of the paper is to extend a large margin approach, which is a classification method, by adding a regularization term which takes into account the MDP structure. The derived algorithm, called Reward-regularized Classification for Apprenticeship Learning (RCAL), does not need to solve MDPs. But, the major advantage is that it can be boosted: this avoids the choice of features, which is a drawback of parametric approaches. A state of the art experiment (Highway) and generic experiments (structured Garnets) are conducted to show the performance of RCAL compared to algorithms from the literature.
机译:本文涉及从示范中学习的问题,其中称为学徒的代理商试图从其他代理人的示范中学到一个名为专家的代理人。为了解决这个问题,我们将自己置于马尔可夫决策过程(MDP)框架中,这非常适合连续决策问题。解决这个问题的方法是将其减少到分类,但这样做是不考虑MDP结构。考虑到MDP结构的其他方法需要解决MDP,这是一个困难的任务和/或需要选择的功能。纸张的主要贡献是扩展了一个大的边际方法,它是一种分类方法,通过添加考虑MDP结构的正则化术语。派生算法,称为奖励正常化的学徒学习分类(RCAL),不需要解决MDP。但是,主要优点是它可以提升:这避免了特征的选择,这是参数方法的缺点。进行了最先进的实验(公路)和通用实验(结构化装甲)以显示与文献中的算法相比的Rcal的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号