【24h】

VOI-aware MCTS

机译:VOI感知MCTS

获取原文

摘要

UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in games and Markov decision processes, is based on UCB1, a sampling policy for the Multi-armed Bandit problem (MAB) that minimizes the cumulative regret. However, search differs from MAB in that in MCTS it is usually only the final "arm pull" (the actual move selection) that collects a reward, rather than all "arm pulls". In this paper, an MCTS sampling policy based on Value of Information (VOI) estimates of rollouts is suggested. Empirical evaluation of the policy and comparison to UCB1 and UCT is performed on random MAB instances as well as on Computer Go.
机译:在游戏和马尔可夫决策过程中的Monte Carlo树搜索(MCT)的最先进的算法,是基于UCB1,用于多武装强盗问题的采样策略(MAB),从而最大限度地减少累积遗憾。但是,搜索与MAb的不同之处在于,在MCT中,通常只有收集奖励的最终“ARM拉”(实际移动选择),而不是所有“ARM拉动”。本文提出了基于信息价值的MCTS抽样策略(VOI)卷展栏估算。对UCB1和UCT的策略和比较的实证评估在随机MAB实例以及计算机上执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号