首页> 外文会议>IEEE International Conference on Services Computing >BSE-MAML: Model Agnostic Meta-Reinforcement Learning via Bayesian Structured Exploration
【24h】

BSE-MAML: Model Agnostic Meta-Reinforcement Learning via Bayesian Structured Exploration

机译:BSE-MAML:贝叶斯结构勘探模型无可止境的荟萃强化学习

获取原文

摘要

Deep reinforcement learning (RL) is playing an increasingly important role in web services such as news recommendation, vulnerability detection, and personalized services. Exploration is a key component of RL, which determines whether these RL-based applications could find effective solutions eventually. In this paper, we propose a novel gradient–based fast adaptation approach for model agnostic meta-reinforcement learning via Bayesian structure exploration (BSE-MAML). BSE-MAML could effectively learn exploration strategies from prior experience by updating policy with embedding latent space via a Bayesian mechanism. Coherent stochasticity injected by latent space are more efficient than random noise, and can produce exploration strategies to perform well in novel environment. We have conducted extensive experiments to evaluate BSE-MAML. Experimental results show that BSE-MAML achieves better performance in exploration in realistic environments with sparse rewards, compared to state-of-the-art meta-RL algorithms, RL methods without learning exploration strategies, and task-agnostic exploration approaches.
机译:深度加强学习(RL)在Web服务中发挥着越来越重要的作用,例如新闻推荐,漏洞检测和个性化服务。探索是RL的关键组成部分,它决定了这些基于RL的应用程序是否可以最终找到有效的解决方案。在本文中,我们提出了一种基于新的基于梯度的快速适应方法,可通过贝叶斯结构勘探(BSE-MAML)进行模型不可行的元增强学习。 BSE-MAML可以通过更新通过贝叶斯机制嵌入潜在空间的政策,有效地学习勘探策略。被潜在空间注入的​​相干性速度比随机噪声更有效,并且可以产生在新环境中表现良好的勘探策略。我们对评估BSE-MAML进行了广泛的实验。实验结果表明,与最先进的META-RL算法相比,BSE-MAML在具有稀疏奖励的现实环境中实现了更好的性能,而无需学习探索策略,以及任务 - 不可知的勘探方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号