首页> 外文会议>Conference on uncertainty in artificial intelligence >Selecting Computations: Theory and Applications
【24h】

Selecting Computations: Theory and Applications

机译:选择计算:理论与应用

获取原文
获取外文期刊封面目录资料

摘要

Sequential decision problems are often approximately solvable by simulating possible future action sequences. Metalevel decision procedures have been developed for selecting which action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian selection problems, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive conjecture that an optimal policy will necessarily reach a decision in all cases. We then derive heuristic approximations in both Bayesian and distribution-free settings and demonstrate their superiority to bandit-based heuristics in one-shot decision problems and in Go.
机译:通过模拟可能的未来动作序列,顺序决策问题通常可以大致解决。已经开发出了元级决策程序,用于基于对任何特定模拟所导致的决策质量的预期改进的估计,来选择要模拟的动作序列;一个例子是最近在Go游戏中使用强盗算法来控制Monte Carlo树搜索的工作。在本文中,我们为贝叶斯选择问题的统计框架中的元级决策开发了理论基础,并认为(如其他人所做的那样)这比强盗框架更合适。我们得出了许多适用于蒙特卡洛选择问题的基本结果,包括在某些情况下最优政策的第一个有限采样界限;我们还为直觉猜想提供了一个简单的反例,即在所有情况下最优策略都必定会做出决定。然后,我们在贝叶斯和无分布环境中得出启发式近似值,并在单发决策问题和Go中证明它们优于基于强盗的启发式方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号