首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Convergence of Value Aggregation for Imitation Learning
【24h】

Convergence of Value Aggregation for Imitation Learning

机译:仿制学习价值聚集的融合

获取原文
       

摘要

Value aggregation is a general framework for solving imitation learning problems. Based on the idea of data aggregation, it generates a policy sequence by iteratively interleaving policy optimization and evaluation in an online learning setting. While the existence of a good policy in the policy sequence can be guaranteed non-asymptotically, little is known about the convergence of the sequence or the performance of the last policy. In this paper, we debunk the common belief that value aggregation always produces a convergent policy sequence with improving performance. Moreover, we identify a critical stability condition for convergence and provide a tight non-asymptotic bound on the performance of the last policy. These new theoretical insights let us stabilize problems with regularization, which removes the inconvenient process of identifying the best policy in the policy sequence in stochastic problems.
机译:值聚合是解决模仿学习问题的一般框架。基于数据聚合的概念,它通过在在线学习设置中迭代交织策略优化和评估来生成策略序列。虽然可以保证在政策序列中的良好政策的存在,但是对于序列的汇聚或最后一项策略的性能很少。在本文中,我们揭示了常见的信念,即价值聚合总是产生具有提高性能的收敛策略序列。此外,我们确定了收敛的关键稳定条件,并提供了最后一项策略性能的紧密非渐近界。这些新的理论洞察力让我们稳定正规化问题,从而消除了在随机问题中识别政策序列中最佳政策的不便。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号