Tighter Value Function Bounds for Bayesian Reinforcement Learning

机译：贝叶斯强化学习的更严格的价值函数界限

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Bayesian reinforcement learning (BRL) provides a principled framework for optimal exploration-exploitation tradeoff in reinforcement learning. We focus on model-based BRL, which involves a compact formulation of the optimal tradeoff from the Bayesian perspective. However, it still remains a computational challenge to compute the Bayes-optimal policy. In this paper, we propose a novel approach to compute tighter value function bounds of the Bayes-optimal value function, which is crucial for improving the performance of many model-based BRL algorithms. We then present how our bounds can be integrated into real-time AO* heuristic search, and provide a theoretical analysis on the impact of improved bounds on the search efficiency. We also provide empirical results on standard BRL domains that demonstrate the effectiveness of our approach.

机译：贝叶斯加固学习（BRL）为加固学习中的最佳探索剥削权衡提供了原则框架。我们专注于基于模型的BRL，这涉及从贝叶斯视角的最佳权衡紧凑的制定。但是，仍然是计算贝叶斯最佳政策的计算挑战。在本文中，我们提出了一种新的方法来计算贝叶斯最优值函数的更严格的值函数界限，这对于提高基于模型的BRL算法的性能至关重要。然后，我们提出了我们的界限如何集成到实时AO *启发式搜索中，并提供有关改进界限对搜索效率的影响的理论分析。我们还为标准BRL结构域提供了实证结果，证明了我们方法的有效性。

著录项

来源
《AAAI Conference on Artificial Intelligence》|2015年||共8页
会议地点
作者
Kanghoon Lee; Kee-Eung Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Tight bounds on l(1) approximation and learning of self-bounding functions [J] . Feldman Vitaly, Kothari Pravesh, Vondrak Jan Theoretical computer science . 2020,第期

机译：L（1）近似和学习自限函数的紧张界限
2. Tight Bounds on $ell_1$ Approximation and Learning of Self-Bounding Functions [J] . Vitaly Feldman, Pravesh Kothari, Jan Vondrák JMLR: Workshop and Conference Proceedings . 2017,第2009期

机译：对$ ell_1 $的逼近和自边界函数的学习
3. Bayesian Reinforcement Learning and Bayesian Deep Learning for Blockchains With Mobile Edge Computing [J] . Asheralieva Alia, Niyato Dusit IEEE Transactions on Cognitive Communications and Networking . 2021,第1期

机译：贝叶斯加固学习和贝叶斯深入学习，具有移动边缘计算的区块
4. Tighter Value Function Bounds for Bayesian Reinforcement Learning [C] . Kanghoon Lee, Kee-Eung Kim AAAI Conference on Artificial Intelligence . 2015

机译：贝叶斯强化学习的更严格的价值函数界限
5. A Bounded Actor-Critic Algorithm for Reinforcement Learning [D] . Lawhead, Ryan Jacob. 2017

机译：一种有限于钢筋学习的批评算法
6. Reinforcement learning and Bayesian data assimilation for model‐informed precision dosing in oncology [O] . Corinna Maier, Niklas Hartung, Charlotte Kloft, 2021

机译：加固学习和贝叶斯数据同化在肿瘤学中的模型知识精度给药
7. Robust Bayesian reinforcement learning through tight lower bounds [O] . Christos Dimitrakakis 2011

机译：通过严格的下限进行鲁棒的贝叶斯强化学习
8. Submodular Set Functions, Matroids and the Greedy Algorithm: Tight Worst-Case Bounds and Some Generalizations of the Rado-Edmonds Theorem. [R] . Conforti, M., Cornuejols, G. 1983

机译：子模集函数，拟阵和贪心算法：最差 - 最坏情况界和Rado-Edmonds定理的一些推广。

Tighter Value Function Bounds for Bayesian Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅