Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning

机译：改进了未招备的持续加强学习的遗憾界限

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of undiscounted rein-forcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T~(3/4)) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T~(2/3)) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.

机译：我们考虑了连续状态空间中未能的钢筋学习问题。此设置的遗憾范围通常在奖励和转换功能结构上的各种假设下。在假设奖励和转换概率是Lipschitz，对于ortner和Ryabko（2012）给出任何T步骤后，对于1维状态空间，o（t〜（3/4））的遗憾。这里我们通过使用用于估计转换概率分布的非参数核密度估计来改进该结果，并获得依赖于转变概率分布的平滑度的后悔界限。特别地，在过渡概率函数平滑的假设下，遗憾地显示为渐近的o（t〜（2/3）），用于在1维状态空间中加强学习。最后，我们还导出了更高维度状态空间的改进的遗憾范围。

著录项

来源
《International Conference on Machine Learning》|2016年||共9页
会议地点
作者
K. Lakshmanan; Ronald Ortner; Daniil Ryabko;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs [J] . Mohammad Sadegh Talebi, Odalric-Ambrym Maillard JMLR: Workshop and Conference Proceedings . 2018,第4期

机译：MDP中无差异强化学习的方差感知后悔范围
2. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs [J] . Mohammad Sadegh Talebi, Odalric-Ambrym Maillard JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：MDP中无差异强化学习的方差感知后悔范围
3. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning [J] . Trummer Immanuel, Wang Junxiong, Wei Ziyun, ACM transactions on database systems . 2021,第3期

机译：SkinnerDB：通过加强学习遗憾的查询评估
4. Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning [C] . K. Lakshmanan, Ronald Ortner, Daniil Ryabko International Conference on Machine Learning . 2016

机译：改进了未招备的持续加强学习的遗憾界限
5. A Bounded Actor-Critic Algorithm for Reinforcement Learning [D] . Lawhead, Ryan Jacob. 2017

机译：一种有限于钢筋学习的批评算法
6. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions [O] . Minija Tamosiunaite, Tamim Asfour, Florentin Wörgötter -1

机译：通过使用连续动作的基于受体场的函数逼近方法通过强化学习来学习达到
7. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning [O] . 2007

机译：未贴积的强化学习的对数在线遗憾界限

Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅