...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
【24h】

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

机译:为什么后面抽样比乐观学习更乐观?

获取原文
           

摘要

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an $ilde{O}(Hsqrt{SAT})$ Bayesian regret bound for PSRL in finite-horizon episodic Markov decision processes. This improves upon the best previous Bayesian regret bound of $ilde{O}(H S sqrt{AT})$ for any reinforcement learning algorithm. Our theoretical results are supported by extensive empirical evaluation.
机译:计算结果表明,用于增强学习(PSRL)的后部采样显着优于由乐观乐观驱动的现有算法,例如UCRL2。我们提供洞察这一表现提升的程度和驱动它的现象。我们利用这一洞察力来建立$ tilde {o}(h sqrt {sat})$ jayesian遗憾在有限地平线episodic markov决策过程中为psrl绑定。这改善了以$ tilde {o}(H s sqrt {at})$的最佳前一个贝叶斯遗憾,适用于任何加强学习算法。我们的理论结果得到了广泛的实证评价支持。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号