A Sparse Sampling Algorithms for Near-Optimal Planning in Large Markov Decision Processes

Michael Kearns; Yishay Mansour; Andrew Y. Ng

首页> 外文期刊>Machine Learning >A Sparse Sampling Algorithms for Near-Optimal Planning in Large Markov Decision Processes

【24h】

A Sparse Sampling Algorithms for Near-Optimal Planning in Large Markov Decision Processes

机译：大型马尔可夫决策过程中近最优规划的稀疏采样算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows linearly with the state space size in the worst case. In this paper we present a new algorithm that, given only a generative model (a natural and common type of simulator) or an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states.

机译：将马尔可夫决策过程（MDP）应用到现实问题中的一个关键问题是计划的复杂性如何随MDP的规模扩展。在状态空间非常大或无限的随机环境中，传统的计划和强化学习算法可能不适用，因为在最坏的情况下，它们的运行时间通常随状态空间的大小线性增长。在本文中，我们提出了一种新算法，该算法仅给定生成模型（自然和通用类型的模拟器）或任意MDP，即可在不依赖于状态运行时间的情况下执行在线，接近最佳的计划状态数。

著录项

来源
《Machine Learning》 |2002年第3期|p.193-208|共16页
作者
Michael Kearns; Yishay Mansour; Andrew Y. Ng;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
reinforcement learning; Markov decision processes; planning;

机译：强化学习;马尔可夫决策过程;规划;

相似文献

外文文献
中文文献
专利

1. A SAMPLED FICTITIOUS PLAY BASED LEARNING ALGORITHM FOR INFINITE HORIZON MARKOV DECISION PROCESSES [J] . Esra Sisikoglu, Marina A. Epelman, Robert L. Smith Proceedings of the Workshop on Principles of Advanced and Distributed Simulation . 2011,第CDaROM期

机译：无限地平线马尔可夫决策过程的基于虚拟演习的采样演算法
2. An adaptive sampling algorithm for solving Markov decision processes [J] . Chang HS, Fu MC, Hu JQ, Operations Research: The Journal of the Operations Research Society of America . 2005,第1期

机译：求解马尔可夫决策过程的自适应采样算法
3. Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward [J] . Jan Kretinsky, Fabian Michel, Lukas Michel, JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：有限记忆近最优学习马尔可夫决策过程，长期奖励
4. Planning for Markov Decision Processes with Sparse Stochasticity [C] . Maxim Likhachev, Geoff Gordon, Sebastian Thrun Annual Conference on Neural Information Processing Systems . 2005

机译：规划Markov决策过程，具有稀疏的随机性
5. Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments. [D] . Amato, Christopher. 2010

机译：用于集中式和分散式部分可观察的马尔可夫决策过程的算法中的可伸缩性不断增强：在不确定的环境中进行有效的决策和协调。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. A SAMPLED FICTITIOUS PLAY BASED LEARNING ALGORITHM FOR INFINITE HORIZON MARKOV DECISION PROCESSES [O] . S. Jain, R. R. Creasey, J. Himmelspach, 2014

机译：基于样本虚构游戏的无限马尔可夫决策过程学习算法

A Sparse Sampling Algorithms for Near-Optimal Planning in Large Markov Decision Processes

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅