首页> 外文会议>International Joint Conference on Neural Networks >AC2: A Policy Gradient Actor with Primary and Secondary Critics
【24h】

AC2: A Policy Gradient Actor with Primary and Secondary Critics

机译:AC2:具有主要和次要批评家的政策梯度参与者

获取原文

摘要

We propose AC2, a policy gradient algorithm that employs a primary and a secondary critic to manage both bias and variance in policy gradients. We present through analyses and experiments that performance becomes more stable if a secondary critic concentrates on few problematic states (upper 95-percentile) that cause extreme changes in value estimates. This scheme can keep biases tolerable while lowering variances. We relate our algorithm with critic ensembles that have more components and show that ensemble averaging may not significantly reduce gradient variances in more difficult environments. We test our algorithm in a series of high-dimensional experiments and report better performance than ensembles with more critic components especially in harder environments. In addition, performance is more stable if the secondary critic trains on a few problematic states than by random sampling. Our algorithm reports better reward performance than single critic and other RL models.
机译:我们提出AC2,这是一种策略梯度算法,该算法采用主要和辅助批判者来管理策略梯度中的偏差和方差。通过分析和实验,我们发现,如果次要批评者只关注很少会引起价值估计值急剧变化的问题状态(较高的95%),则性能会变得更加稳定。该方案可以在降低偏差的同时保持可容忍的偏差。我们将算法与具有更多成分的批注合奏相关联,并表明在更困难的环境中,合奏平均可能不会显着减小梯度方差。我们在一系列高维实验中测试了我们的算法,并报告了比带有更多批判组件的乐曲更好的性能,尤其是在较硬的环境中。此外,如果次要评论家针对一些有问题的状态进行训练,则性能比通过随机采样更为稳定。与单评论家和其他RL模型相比,我们的算法报告了更好的奖励表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号