...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Accelerating Imitation Learning with Predictive Models
【24h】

Accelerating Imitation Learning with Predictive Models

机译:通过预测模型加速模仿学习

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Sample efficiency is critical in solving real-world reinforcement learning problems where agent-environment interactions can be costly. Imitation learning from expert advice has proved to be an effective strategy for reducing the number of interactions required to train a policy. Online imitation learning, which interleaves policy evaluation and policy optimization, is a particularly effective technique with provable performance guarantees. In this work, we seek to further accelerate the convergence rate of online imitation learning, thereby making it more sample efficient. We propose two model-based algorithms inspired by Follow-the-Leader (FTL) with prediction: MoBIL-VI based on solving variational inequalities and MoBIL-Prox based on stochastic first-order updates. These two methods leverage a model to predict future gradients to speed up policy learning. When the model oracle is learned online, these algorithms can provably accelerate the best known convergence rate up to an order. Our algorithms can be viewed as a generalization of stochastic Mirror-Prox (Juditsky et al., 2011), and admit a simple constructive FTL-style analysis of performance.
机译:样本效率对于解决代理环境互动可能成本高昂的真实加固学习问题至关重要。从专家建议的模仿学习已经证明是减少培训政策所需的互动次数的有效策略。在线仿制学习,交织策略评估和政策优化,是一种特别有效的技术,具有可提供的性能保证。在这项工作中,我们寻求进一步加速在线模仿学习的收敛速度,从而使其更高的样本。我们提出了通过跟随 - 领导者(FTL)启发的基于模型的算法:Mobil-VI基于解决随机第一阶更新的变分不等式和Mobil-Prox。这两种方法利用模型来预测未来的渐变来加速政策学习。当模型Oracle在线学习时,这些算法可以可证明最佳已知的收敛速度达到订单。我们的算法可以被视为随机镜像 - PROx的概括(Quitsky等,2011),并承认性能简单的建设性FTL风格分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号