The Uncertainty Bellman Equation and Exploration

Brendan O’Donoghue; Ian Osband; Remi Munos; Vlad Mnih

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >The Uncertainty Bellman Equation and Exploration

【24h】

The Uncertainty Bellman Equation and Exploration

机译：不确定贝尔曼方程式与探索

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar uncertainty Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the posterior distribution of the Q-values induced by any policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with complex generalization. Substituting our UBE-exploration strategy for $epsilon$-greedy improves DQN performance on 51 out of 57 games in the Atari suite.

机译：我们在强化学习中考虑探索/开发问题。为了进行开发，众所周知的是，贝尔曼方程式将任何时间步长的值都连接到后续时间步长的期望值。在本文中，我们考虑了类似的不确定性Bellman方程（UBE），该方程将任何时间步长的不确定性与后续时间步长的预期不确定性联系起来，从而将政策的潜在探索利益扩展到了各个时间步长之外。我们证明，UBE的唯一不动点在任何策略引起的Q值的后验分布方差上产生一个上限。这个界限可能比传统的基于计数的奖金更为严格，因为传统的基于计数的奖金使标准差而不是方差增加了。重要的是，与几种现有的乐观方法不同，该方法自然可以扩展到具有复杂泛化的大型系统。将我们的UBE探索策略替换为$ epsilon $ -greedy，可以改善Atari套件中57款游戏中51款的DQN性能。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共10页
作者
Brendan O’Donoghue; Ian Osband; Remi Munos; Vlad Mnih;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
入库时间 2022-08-18 15:56:25

相似文献

外文文献
中文文献
专利

1. The Uncertainty Bellman Equation and Exploration [J] . Brendan O’Donoghue, Ian Osband, Remi Munos, JMLR: Workshop and Conference Proceedings . 2018,第4期

机译：不确定贝尔曼方程式与探索
2. Investment under uncertainty: calculating the value function when the Bellman equation cannot be solved analytically [J] . Thomas Dangl, Franz Wirl Journal of Economic Dynamics and Control . 2004,第7期

机译：不确定条件下的投资：在无法解析贝尔曼方程式时计算价值函数
3. Multigrid methods for second order hamilton-jacobi-bellman and hamilton-jacobi-bellman-isaacs equations (Conference Paper) [J] . Han D., Wan J.W.L. SIAM Journal on Scientific Computing . 2013,第5期

机译：二阶hamilton-jacobi-bellman和hamilton-jacobi-bellman-isaacs方程的多重网格方法（会议论文）
4. Exploration Driven by an Optimistic Bellman Equation [C] . Samuele Tosatto, Carlo D’Eramo, Joni Pajarinen, International Joint Conference on Neural Networks . 2019

机译：乐观的Bellman方程驱动的探索
5. Geometric aspects of exact solutions of bellman equations of harmonic analysis problems. [D] . Ivanisvili, Paata. 2015

机译：谐波分析问题的Bellman方程精确解的几何方面。
6. Forward and Backward Bellman Equations Improve the Efficiency of the EM Algorithm for DEC-POMDP [O] . Takehiro Tottori, Tetsuya J. Kobayashi 2021

机译：向前和后退Bellman方程提高了DEC-POMDP的EM算法的效率
7. C0 finite element approximations of linear elliptic equations in non-divergence form and Hamilton–Jacobi–Bellman equations with Cordes coefficients [O] . Shuonan Wu 2021

机译：CORDES系数的非发散形式线性椭圆方程的C0有限元近似值，横向系数哈米尔顿 - Jacobi-Bellman方程

The Uncertainty Bellman Equation and Exploration

摘要

著录项

相似文献

相关主题

期刊订阅