Adaptive Sampling Algorithm for Solving Markov Decision Processes

机译：求解马尔可夫决策过程的自适应采样算法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Based on recent results for multi-armed bandit problems, the authors propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate of Order of H(lnN)N, where H is the horizon length and N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is Order of ((Abs. Val. A)N) to the H power, independent of the state space size, where Abs. Val. A is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.

著录项

作者
Chang, H. S. ; Fu, M. C. ; Marcus, S. I.;
展开▼
作者单位

展开▼
年度 2002
页码 1-18
总页数 18
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Algorithms; Optimization; Adaptive systems; Markov processes; Decision making; Problem solving; Approximation(Mathematics); Convergence; Stochastic control; Sampling;

机译：算法;优化;自适应系统;马尔可夫过程;决策;问题解决;逼近（数学）;收敛;随机控制;抽样;

相似文献

外文文献
中文文献
专利

1. An adaptive sampling algorithm for solving Markov decision processes [J] . Chang HS, Fu MC, Hu JQ, Operations Research: The Journal of the Operations Research Society of America . 2005,第1期

机译：求解马尔可夫决策过程的自适应采样算法
2. A Sparse Sampling Algorithms for Near-Optimal Planning in Large Markov Decision Processes [J] . Michael Kearns, Yishay Mansour, Andrew Y. Ng Machine Learning . 2002,第2a3期

机译：大型马尔可夫决策过程中近最优规划的稀疏采样算法
3. Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm [J] . Arruda E. F., Fragoso M. D. European Journal of Operational Research . 2015,第3期

机译：通过两阶段时间聚合算法求解平均成本马尔可夫决策过程
4. Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes [C] . Aaron Sidford, Mengdi Wang, Xian Wu, Annual ACM-SIAM Symposium on Discrete Algorithms . 2018

机译：varving Markov决策过程的差异减少了价值迭代和更快的算法
5. Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments. [D] . Amato, Christopher. 2010

机译：用于集中式和分散式部分可观察的马尔可夫决策过程的算法中的可伸缩性不断增强：在不确定的环境中进行有效的决策和协调。
6. Tracking Problem Solving by Multivariate Pattern Analysis and Hidden Markov Model Algorithms [O] . John R. Anderson -1

机译：通过多变量模式分析和隐马尔可夫模型算法追踪问题解决问题
7. An adaptive sampling algorithm for solving Markov decision processes [O] . Hyeong Soo Chang, Michael C. Fu, Jiaqiao Hu, 2005

机译：一种求解马尔可夫决策过程的自适应采样算法

Adaptive Sampling Algorithm for Solving Markov Decision Processes

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅