首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal
【24h】

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal

机译:基于模型的增强学习,具有生成模型是最佳的最佳选择

获取原文
       

摘要

This work considers the sample and computational complexity of obtaining an $epsilon$-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. In this model, the learner accesses the underlying transition model via a sampling oracle that provides a sample of the next state, when given any state-action pair as input. We are interested in a basic and unresolved question in model based planning: is this na?ve “plug-in” approach — where we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP — non-asymptotically, minimax optimal? Our main result answers this question positively. With regards to computation, our result provides a simpler approach towards minimax optimal planning: in comparison to prior model-free results, we show that using emph{any} high accuracy, black-box planning oracle in the empirical model suffices to obtain the minimax error rate. The key proof technique uses a leave-one-out analysis, in a novel “absorbing MDP” construction, to decouple the statistical dependency issues that arise in the analysis of model-based planning; this construction may be helpful more generally.
机译:这项工作考虑在折扣马尔可夫决策过程(MDP)中获取$ epsilon $ -optimal策略的样本和计算复杂性,因为只访问生成模型。在该模型中,学习者通过采样Oracle访问底层转换模型,该样品提供下一个状态的样本,当给定任何状态操作对作为输入时。我们对基于模型的规划中的基本和未解决的问题感兴趣:这是这个Na吗?Ve“插件”方法 - 我们在MDP中建立了从观察中的转换模型的最大可能性估计,然后在此找到最佳政策经验MDP - 非渐近,极限最佳?我们的主要结果积极回答这个问题。关于计算,我们的结果提供了更简单的方法,迈向最佳规划:与现有的无模式结果相比,我们表明使用 emph {任何}高精度,黑匣子规划Oracle在实证模型中足以获得极小的错误率。关键证明技术在新颖的“吸收MDP”构造中使用休假分析,以解耦了基于模型的规划分析中出现的统计依赖性问题;这种结构通常更有帮助。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号