首页> 外文会议>International Conference on Machine Learning >Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning
【24h】

Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning

机译:改进了未招备的持续加强学习的遗憾界限

获取原文

摘要

We consider the problem of undiscounted rein-forcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T~(3/4)) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T~(2/3)) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.
机译:我们考虑了连续状态空间中未能的钢筋学习问题。此设置的遗憾范围通常在奖励和转换功能结构上的各种假设下。在假设奖励和转换概率是Lipschitz,对于ortner和Ryabko(2012)给出任何T步骤后,对于1维状态空间,o(t〜(3/4))的遗憾。这里我们通过使用用于估计转换概率分布的非参数核密度估计来改进该结果,并获得依赖于转变概率分布的平滑度的后悔界限。特别地,在过渡概率函数平滑的假设下,遗憾地显示为渐近的o(t〜(2/3)),用于在1维状态空间中加强学习。最后,我们还导出了更高维度状态空间的改进的遗憾范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号