首页> 外文会议>European conference on machine learning and knowledge discovery in databases >Continuous Upper Confidence Trees with Polynomial Exploration - Consistency
【24h】

Continuous Upper Confidence Trees with Polynomial Exploration - Consistency

机译:具有多项式探索的连续高置信度树-一致性

获取原文

摘要

Upper Confidence Trees (UCT) are now a well known algorithm for sequential decision making; it is a provably consistent variant of Monte-Carlo Tree Search. However, the consistency is only proved in a the case where the action space is finite. We here propose a proof in the case of fully observable Markov Decision Processes with bounded horizon, possibly including infinitely many states, infinite action space and arbitrary stochastic transition kernels. We illustrate the consistency on two benchmark problems, one being a legacy toy problem, the other a more challenging one, the famous energy unit commitment problem.
机译:最高置信树(UCT)现在是一种众所周知的顺序决策算法。它是蒙特卡洛树搜索的一种可证明是一致的变体。但是,仅在动作空间有限的情况下证明了一致性。我们在水平边界有限的完全可观察的马尔可夫决策过程的情况下提出证明,可能包括无限多个状态,无限作用空间和任意随机转移核。我们说明了两个基准问题的一致性,一个是传统玩具问题,另一个是更具挑战性的问题,即著名的能源单位承诺问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号