首页> 外文OA文献 >Bayesian mixture modelling and inference based Thompson sampling in Monte-Carlo tree search
【2h】

Bayesian mixture modelling and inference based Thompson sampling in Monte-Carlo tree search

机译:蒙特卡罗树搜索中基于贝叶斯混合建模和推理的Thompson采样

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Monte-Carlo tree search is drawing great interest in the domain of planning under uncertainty, particularly when little or no domain knowledge is available. One of the central problems is the trade-off between exploration and exploitation. In this paper we present a novel Bayesian mixture modelling and inference based Thompson sampling approach to addressing this dilemma. The proposed Dirichlet-NormalGamma MCTS (DNG-MCTS) algorithm represents the uncertainty of the accumulated reward for actions in the MCTS search tree as a mixture of Normal distributions and inferences on it in Bayesian settings by choosing conjugate priors in the form of combinations of Dirichlet and NormalGamma distributions. Thompson sampling is used to select the best action at each decision node. Experimental results show that our proposed algorithm has achieved the state-of-the-art comparing with popular UCT algorithm in the context of online planning for general Markov decision processes
机译:蒙特卡洛树搜索在不确定性下的规划领域引起了极大的兴趣,尤其是在缺乏或没有领域知识的情况下。中心问题之一是勘探与开发之间的权衡。在本文中,我们提出了一种新颖的贝叶斯混合建模和基于推理的汤普森采样方法来解决这一难题。拟议的Dirichlet-NormalGamma MCTS(DNG-MCTS)算法通过以Dirichlet组合的形式选择共轭先验来表示贝叶斯设置中MCTS搜索树中动作的累积奖励的不确定性,包括正态分布和对其的推论的混合和NormalGamma分布。汤普森采样用于选择每个决策节点上的最佳操作。实验结果表明,在通用马尔可夫决策过程在线规划的背景下,我们提出的算法与流行的UCT算法相比具有最先进的水平

著录项

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号