首页> 外文会议>International Conference on Algorithmic Learning Theory >Selecting Near-Optimal Approximate State Representations in Reinforcement Learning
【24h】

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

机译:选择钢筋学习中的近乎最佳近似国家表示

获取原文

摘要

We consider a reinforcement learning setting introduced in [5] where the learner does not have explicit access to the states of the underlying Markov decision process (MDP). Instead, she has access to several models that map histories of past interactions to states. Here we improve over known regret bounds in this setting, and more importantly generalize to the case where the models given to the learner do not contain a true model resulting in an MDP representation but only approximations of it. We also give improved error bounds for state aggregation.
机译:我们考虑在[5]中介绍的加强学习设置,其中学习者没有明确访问底层马尔可夫决策过程(MDP)的状态。相反,她可以访问几个模型,即将过去互动的历史映射到州。在这里,我们在此设置中提高了已知的遗憾范围,更重要的是概括到给出的学习者的模型不包含真实模型,导致MDP表示但仅近似。我们还给出了状态聚合的改进错误界限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号