【24h】

Practical Open-Loop Optimistic Planning

机译:实用的开放式乐观计划

获取原文

摘要

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KL-OLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms.
机译:当只允许访问生成模型,受限于开环政策(即一系列行动)且受预算约束时,我们会考虑马尔可夫决策过程中的在线计划问题。在这种情况下,如数值实验所示,开环乐观规划(OLOP)算法具有良好的理论保证,但在实践中过于保守。我们提出了具有更严格的置信上限的算法的修改版本KL-OLOP,它在保留样本复杂度范围的同时,可以带来更好的实用性能。最后,我们提出了一种有效的实现方式,可以显着提高两种算法的时间复杂度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号