首页> 外文会议>European conference on machine learning and knowledge discovery in databases >Regret Bounds for Reinforcement Learning with Policy Advice
【24h】

Regret Bounds for Reinforcement Learning with Policy Advice

机译:在政策指导下加强学习的遗憾界限

获取原文

摘要

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of O(T~(1/2)) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided.
机译:在某些强化学习问题中,可能会向代理人提供一组输入策略,这些策略可能是从先前的经验中学到的或由顾问提供的。我们提出了一种带有策略建议的强化学习(RLPA)算法,该算法利用了此输入集,并学习使用该集中的最佳策略来处理当前的强化学习任务。我们证明RLPA相对于最佳输入策略具有O(T〜(1/2))的次线性后悔,并且该后悔及其计算复杂度均与状态和操作空间的大小无关。我们的经验模拟支持我们的理论分析。这表明RLPA在提供某些先前良好策略的大型域中可能会提供显着的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号