Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

机译：学习是计划：靠近贝叶斯 - 通过Monte-Carlo树搜索靠近最佳钢筋

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Bayes-optimal behavior, while well-defined, is often difficult to achieve. Recent advances in the use of Monte-Carlo tree search (MCTS) have shown that it is possible to act near-optimally in Markov Decision Processes (MDPs) with very large or infinite state spaces. Bayes-optimal behavior in an unknown MDP is equivalent to optimal behavior in the known belief-space MDP, although the size of this belief-space MDP grows exponentially with the amount of history retained, and is potentially infinite. We show how an agent can use one particular MCTS algorithm, Forward Search Sparse Sampling (FSSS), in an efficient way to act nearly Bayes-optimally for all but a polynomial number of steps, assuming that FSSS can be used to act efficiently in any possible underlying MDP.

机译：贝叶斯 - 最佳行为，虽然明确，但往往难以实现。使用Monte-Carlo树搜索（MCT）的最新进展表明，可以在带有非常大或无限状态空间的马尔可夫决策过程（MDP）中接近最佳地采用。未知MDP中的贝叶斯 - 最佳行为相当于已知信仰空间MDP中的最佳行为，尽管该信仰空间MDP的大小以保留的历史数量呈指数级增长，并且可能是无限的。我们展示了代理商如何使用一个特定的MCT算法，以有效的方式使用一个特定的MCT算法，转发搜索稀疏采样（FSSS），以实现近似贝叶斯 - 除了可以使用FSS可以用来有效地起作用的所有步骤 - 除了多项式的步骤中可能的底层MDP。

著录项

来源
《Conference on Uncertainty in Artificial Intelligence》|2011年||共8页
会议地点
作者
John Asmuth; Michael Littman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Monte-Carlo tree search for Bayesian reinforcement learning [J] . Ngo Anh Vien, Wolfgang Ertel, Viet-Hung Dang, Applied Intelligence . 2013,第2期

机译：蒙特卡洛树搜索用于贝叶斯强化学习
2. Monte-Carlo tree search for Bayesian reinforcement learning [J] . Vien N.A., Ertel W., Dang V.-H., Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2013,第2期

机译：蒙特卡罗树搜索以进行贝叶斯强化学习
3. Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search [J] . Dayan P., Guez A., Silver D. The Journal of Artificial Intelligence Research . 2013,第12期

机译：基于蒙特卡洛树搜索的可扩展高效贝叶斯自适应强化学习
4. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search [C] . John Asmuth, Michael Littman Uncertainty in artificial intelligence . 2011

机译：正在计划学习：在贝叶斯附近通过蒙特卡洛树搜索进行最佳强化学习
5. Adaptive Bayes-Optimal Methods for Stochastic Search with Applications to Preference Learning [D] . Pallone, Stephen N. 2017

机译：随机搜索的自适应贝叶斯最优方法及其在偏好学习中的应用
6. Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning [O] . Xiaoxue Wang, Yujie Qian, Hanyu Gao, 2020

机译：朝着蒙特卡罗树搜索和加固学习有效发现绿色综合途径
7. Monte-Carlo Tree Search and Reinforcement Learning for Reconfiguring Data Stream Processing on Edge Computing [O] . Alexandre da Silva Veith, Marcos Dias de Assuncao, Laurent Lefevre 2019

机译：Monte-Carlo树搜索和加固学习，用于重新配置边缘计算数据流处理

Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

摘要

著录项

相似文献

相关主题

期刊订阅