首页> 外文会议>Uncertainty in artificial intelligence >Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search
【24h】

Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

机译:正在计划学习:在贝叶斯附近通过蒙特卡洛树搜索进行最佳强化学习

获取原文
获取原文并翻译 | 示例

摘要

Bayes-optimal behavior, while well-defined, is often difficult to achieve. Recent advances in the use of Monte-Carlo tree search (MCTS) have shown that it is possible to act near-optimally in Markov Decision Processes (MDPs) with very large or infinite state spaces. Bayes-optimal be havior in an unknown MDP is equivalent to op timal behavior in the known belief-space MDP, although the size of this belief-space MDP grows exponentially with the amount of history re tained, and is potentially infinite. We show how an agent can use one particular MCTS algorithm, Forward Search Sparse Sampling (FSSS), in an efficient way to act nearly Bayes-optimally for all but a polynomial number of steps, assuming that FSSS can be used to act efficiently in any possible underlying MDP.
机译:贝叶斯最佳行为虽然定义明确,但通常难以实现。蒙特卡洛树搜索(MCTS)的使用的最新进展表明,在状态空间很大或无限的马尔可夫决策过程(MDP)中,可以以接近最佳的方式进行操作。尽管已知空间中的贝叶斯最优行为与保留的历史记录数量成指数增长,并且可能是无限的,但在未知的MDP中贝叶斯最优行为等效于已知的信念空间MDP中的最优行为。我们展示了一个代理如何可以有效地使用一种特定的MCTS算法(正向搜索稀疏采样(FSSS))来对除多项式之外的所有步骤进行近似贝叶斯优化的操作,假设FSSS可用于任何步骤可能的基础MDP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号