首页> 外文会议>International Conference on Machine Learning >Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
【24h】

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

机译:为什么后面抽样比乐观学习更乐观?

获取原文

摘要

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an O(H{the square root of}(SAT)) Bayesian regret bound for PSRL in finite-horizon episodic Markov decision processes. This improves upon the best previous Bayesian regret bound of O(HS{the square root of}(AT)) for any reinforcement learning algorithm. Our theoretical results are supported by extensive empirical evaluation.
机译:计算结果表明,用于增强学习(PSRL)的后部采样显着优于由乐观乐观驱动的现有算法,例如UCRL2。我们提供洞察这一表现提升的程度和驱动它的现象。我们利用这一洞察力来建立一个o(h {square of}(save))贝叶斯遗憾,在有限地平线巨大马尔可夫决策过程中为psrl绑定。这改善了任何加强学习算法的O(HS {(处于}(处)的平方根)的最佳贝叶斯遗憾。我们的理论结果得到了广泛的实证评价支持。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号