首页> 外文会议>AAAI Conference on Artificial Intelligence >Tighter Value Function Bounds for Bayesian Reinforcement Learning
【24h】

Tighter Value Function Bounds for Bayesian Reinforcement Learning

机译:贝叶斯强化学习的更严格的价值函数界限

获取原文

摘要

Bayesian reinforcement learning (BRL) provides a principled framework for optimal exploration-exploitation tradeoff in reinforcement learning. We focus on model-based BRL, which involves a compact formulation of the optimal tradeoff from the Bayesian perspective. However, it still remains a computational challenge to compute the Bayes-optimal policy. In this paper, we propose a novel approach to compute tighter value function bounds of the Bayes-optimal value function, which is crucial for improving the performance of many model-based BRL algorithms. We then present how our bounds can be integrated into real-time AO* heuristic search, and provide a theoretical analysis on the impact of improved bounds on the search efficiency. We also provide empirical results on standard BRL domains that demonstrate the effectiveness of our approach.
机译:贝叶斯加固学习(BRL)为加固学习中的最佳探索剥削权衡提供了原则框架。我们专注于基于模型的BRL,这涉及从贝叶斯视角的最佳权衡紧凑的制定。但是,仍然是计算贝叶斯最佳政策的计算挑战。在本文中,我们提出了一种新的方法来计算贝叶斯最优值函数的更严格的值函数界限,这对于提高基于模型的BRL算法的性能至关重要。然后,我们提出了我们的界限如何集成到实时AO *启发式搜索中,并提供有关改进界限对搜索效率的影响的理论分析。我们还为标准BRL结构域提供了实证结果,证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号