首页> 美国政府科技报告 >Adaptive Sampling Algorithm for Solving Markov Decision Processes
【24h】

Adaptive Sampling Algorithm for Solving Markov Decision Processes

机译:求解马尔可夫决策过程的自适应采样算法

获取原文

摘要

Based on recent results for multi-armed bandit problems, the authors propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate of Order of H(lnN)N, where H is the horizon length and N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is Order of ((Abs. Val. A)N) to the H power, independent of the state space size, where Abs. Val. A is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号