【24h】

Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference

机译:基于变分贝叶斯推理的直接策略搜索强化学习

获取原文

摘要

Direct policy search is a promising reinforcement learning framework in particular for controlling continuous, high-dimensional systems. As one of direct policy search, reward weighted regression (RWR) was proposed by Peters et al. The RWR algorithm estimates the policy parameter based on EM algorithm and is therefore prone to overfitting. In this paper, we focus on variational Bayesian inference to avoid overfitting problem, and propose the direct policy search reinforcement learning based on variational Bayesian inference (VBRL). The performance of the proposed VBRL is assessed in several experiments with mountain car and ball batting task. These experiments highlight the VBRL produces higher average return and outperforms the RWR.
机译:直接策略搜索是一种很有前途的强化学习框架,特别是用于控制连续的高维系统。作为直接策略搜索之一,Peters等人提出了奖励加权回归(RWR)。 RWR算法基于EM算法估计策略参数,因此容易过拟合。在本文中,我们着重于变分贝叶斯推理以避免过度拟合问题,并提出了基于变分贝叶斯推理(VBRL)的直接策略搜索强化学习。拟议的VBRL的性能在山地车和击球任务的多个实验中得到了评估。这些实验表明,VBRL产生更高的平均收益,并且胜过RWR。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号