首页> 外文期刊>Journal of Zhejiang university science >Convergence analysis of an incremental approach to online inverse reinforcement learning
【24h】

Convergence analysis of an incremental approach to online inverse reinforcement learning

机译:在线逆强化学习增量方法的收敛性分析

获取原文
           

摘要

Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.
机译:逆向强化学习(IRL)的兴趣最近有所增加,也就是说,鉴于系统的动态性和专家的行为,人们对恢复基于马尔可夫决策过程(MDP)的奖励函数的问题的兴趣不断增加。本文介绍了一种在线IRL的增量方法。首先,研究了用于IRL问题的增量方法的收敛性,并使用详细的证明提供了学习过程中错误数和后悔的界限。然后推导了基于增量纠错的在线算法来处理IRL问题。关键思想是每次动作不匹配发生时,在当前奖励估算中增加一个增量。这导致接近目标最佳值的估计。在驾驶模拟实验中对提出的方法进行了测试,发现该方法能够有效地恢复足够的奖励功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号