Regret Bounds for Reinforcement Learning with Policy Advice

机译：在政策指导下加强学习的遗憾界限

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of O(T~(1/2)) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided.

机译：在某些强化学习问题中，可能会向代理人提供一组输入策略，这些策略可能是从先前的经验中学到的或由顾问提供的。我们提出了一种带有策略建议的强化学习（RLPA）算法，该算法利用了此输入集，并学习使用该集中的最佳策略来处理当前的强化学习任务。我们证明RLPA相对于最佳输入策略具有O（T〜（1/2））的次线性后悔，并且该后悔及其计算复杂度均与状态和操作空间的大小无关。我们的经验模拟支持我们的理论分析。这表明RLPA在提供某些先前良好策略的大型域中可能会提供显着的优势。

著录项

来源
《European conference on machine learning and knowledge discovery in databases》|2013年|97-112|共16页
会议地点
作者
Mohammad Gheshlaghi Azar; Alessandro Lazaric; Emma Brunskill;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning [J] . Trummer Immanuel, Wang Junxiong, Wei Ziyun, ACM transactions on database systems . 2021,第3期

机译：SkinnerDB：通过加强学习遗憾的查询评估
2. Regret Bounds for Reinforcement Learning via Markov Chain Concentration [J] . Ronald Ortner The Journal of Artificial Intelligence Research . 2020,第7期

机译：通过马尔可夫链集中致致强化学习的遗憾
3. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs [J] . Mohammad Sadegh Talebi, Odalric-Ambrym Maillard JMLR: Workshop and Conference Proceedings . 2018,第4期

机译：MDP中无差异强化学习的方差感知后悔范围
4. Regret Bounds for Reinforcement Learning with Policy Advice [C] . Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill European conference on machine learning and knowledge discovery in databases . 2013

机译：与政策建议的强化学习遗憾的界限
5. Policy advice, non-convex and distributed optimization in reinforcement learning [D] . Zhan, Yusen. 2016

机译：强化学习中的政策建议，非凸和分布式优化
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Regret Bounds for Reinforcement Learning with Policy Advice [O] . Mohammad Gheshlaghi Azar, Ro Lazaric, Emma Brunskill 2015

机译：通过政策建议加强学习的遗憾

Regret Bounds for Reinforcement Learning with Policy Advice

摘要

著录项

相似文献

相关主题

期刊订阅