Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

机译：为什么后面抽样比乐观学习更乐观？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an O(H{the square root of}(SAT)) Bayesian regret bound for PSRL in finite-horizon episodic Markov decision processes. This improves upon the best previous Bayesian regret bound of O(HS{the square root of}(AT)) for any reinforcement learning algorithm. Our theoretical results are supported by extensive empirical evaluation.

机译：计算结果表明，用于增强学习（PSRL）的后部采样显着优于由乐观乐观驱动的现有算法，例如UCRL2。我们提供洞察这一表现提升的程度和驱动它的现象。我们利用这一洞察力来建立一个o（h {square of}（save））贝叶斯遗憾，在有限地平线巨大马尔可夫决策过程中为psrl绑定。这改善了任何加强学习算法的O（HS {（处于}（处）的平方根）的最佳贝叶斯遗憾。我们的理论结果得到了广泛的实证评价支持。

著录项

来源
《International Conference on Machine Learning》|2018年|3981-4773p|共16页
会议地点
作者
Ian Osband; Benjamin Van Roy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? [J] . Ian Osband, Benjamin Van Roy JMLR: Workshop and Conference Proceedings . 2017,第2011期

机译：为什么后面抽样比乐观学习更乐观？
2. Dueling Posterior Sampling for Preference-Based Reinforcement Learning [J] . Ellen Novoseller, Yibing Wei, Yanan Sui, JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：基于优先级的加强学习的决斗后部抽样
3. Rationality, Optimism and Guarantees in General Reinforcement Learning [J] . Peter Sunehag, Marcus Hutter Journal of machine learning research . 2015,第Apr期

机译：强化学习中的理性，乐观和保证
4. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? [C] . Ian Osband, Benjamin Van Roy International Conference on Machine Learning . 2018

机译：为什么后面抽样比乐观学习更乐观？
5. Sample-Efficient Nonconvex Optimization Algorithms in Machine Learning and Reinforcement Learning [D] . Xu, Pan. 2021

机译：机器学习和加固学习中的采样高效的非透露算法
6. Classification of amyotrophic lateral sclerosis disease based on convolutional neural network and reinforcement sample learning algorithm [O] . Abdulkadir Sengur, Yaman Akbulut, Yanhui Guo, 2017

机译：基于卷积神经网络和增强样本学习算法的肌萎缩侧索硬化病分类
7. An Optimistic Posterior Sampling Strategy for Bayesian Reinforcement Learning [O] . Fonteneau, Raphaël, Korda, Nathan, Munos, Rémi 2013

机译：贝叶斯强化学习的后验乐观抽样策略

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

摘要

著录项

相似文献

相关主题

期刊订阅