首页> 外文会议>International Joint Conference on Artificial Intelligence >Increasingly Cautious Optimism for Practical PAC-MDP Exploration
【24h】

Increasingly Cautious Optimism for Practical PAC-MDP Exploration

机译:对实际PAC-MDP勘探越来越谨慎乐观

获取原文

摘要

Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.
机译:勘探战略是基于模型的强化学习中学习代理的重要组成部分。 R-Max和V-Max是PAC-MDP策略,证明具有多项式样本复杂性;然而,他们的勘探行为在实践中往往是谨慎的。我们提出了越来越谨慎乐观的原则(ICO),自动切断不必要的谨慎探索,并将ICO应用于R-Max和V-Max,产生两种新策略,即越来越谨慎的R-Max(ICR),越来越谨慎最大(ICV)。我们证明ICR和ICV都是PAC-MDP,并表明它们的改进是通过更严格的样本复杂性上限保证的。然后,我们通过经验结果展示了它们的显着提高的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号