【24h】

VOI-aware MCTS

机译:VOI感知的MCTS

获取原文
获取原文并翻译 | 示例

摘要

UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in games and Markov decision processes, is based on UCB1, a sampling policy for the Multi-armed Bandit problem (MAB) that minimizes the cumulative regret. However, search differs from MAB in that in MCTS it is usually only the final "arm pull" (the actual move selection) that collects a reward, rather than all "arm pulls". In this paper, an MCTS sampling policy based on Value of Information (VOI) estimates of rollouts is suggested. Empirical evaluation of the policy and comparison to UCB1 and UCT is performed on random MAB instances as well as on Computer Go.
机译:UCT是用于游戏和Markov决策过程中的蒙特卡洛树搜索(MCTS)的最先进算法,它基于UCB1,这是一种针对多臂强盗问题(MAB)的采样策略,可最大程度地减少累积遗憾。但是,搜索与MAB的不同之处在于,在MCTS中,通常只有最终的“手臂拉动”(实际的举动选择)才能获得奖励,而不是所有“手臂拉动”。本文提出了一种基于信息量(VOI)估算值的MCTS抽样策略。对策略的经验评估以及与UCB1和UCT的比较在随机MAB实例以及Computer Go上进行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号