首页> 外文会议>International Joint Conference on Neural Networks >AC2: A Policy Gradient Actor with Primary and Secondary Critics

【24h】

AC2: A Policy Gradient Actor with Primary and Secondary Critics

机译：AC2：具有主要和次要批评家的政策梯度参与者

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose AC2, a policy gradient algorithm that employs a primary and a secondary critic to manage both bias and variance in policy gradients. We present through analyses and experiments that performance becomes more stable if a secondary critic concentrates on few problematic states (upper 95-percentile) that cause extreme changes in value estimates. This scheme can keep biases tolerable while lowering variances. We relate our algorithm with critic ensembles that have more components and show that ensemble averaging may not significantly reduce gradient variances in more difficult environments. We test our algorithm in a series of high-dimensional experiments and report better performance than ensembles with more critic components especially in harder environments. In addition, performance is more stable if the secondary critic trains on a few problematic states than by random sampling. Our algorithm reports better reward performance than single critic and other RL models.

机译：我们提出AC2，这是一种策略梯度算法，该算法采用主要和辅助批判者来管理策略梯度中的偏差和方差。通过分析和实验，我们发现，如果次要批评者只关注很少会引起价值估计值急剧变化的问题状态（较高的95％），则性能会变得更加稳定。该方案可以在降低偏差的同时保持可容忍的偏差。我们将算法与具有更多成分的批注合奏相关联，并表明在更困难的环境中，合奏平均可能不会显着减小梯度方差。我们在一系列高维实验中测试了我们的算法，并报告了比带有更多批判组件的乐曲更好的性能，尤其是在较硬的环境中。此外，如果次要评论家针对一些有问题的状态进行训练，则性能比通过随机采样更为稳定。与单评论家和其他RL模型相比，我们的算法报告了更好的奖励表现。

著录项

来源
《International Joint Conference on Neural Networks 》|2018年|1-8|共8页
会议地点
作者
Alfonso B. Labao; Prospero C. Naval;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Approximation algorithms; Learning (artificial intelligence); Gradient methods; Backpropagation; Machine learning; Function approximation;

机译：训练;近似算法;学习（人工智能）;梯度方法;反向传播;机器学习;函数近似;

相似文献

外文文献
中文文献
专利

1. Actor–Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation [J] . Li Luntong, Li Dazi, Song Tianheng, Neural Networks and Learning Systems, IEEE Transactions on . 2021 ,第3期

机译：政策梯度估计中规则化和特征选择的演员 - 评论家学习控制
2. Bayesian Policy Gradient and Actor-Critic Algorithms [J] . Mohammad Ghavamzadeh, Yaakov Engel, Michal Valko Journal of machine learning research . 2016 ,第66期

机译：贝叶斯政策梯度和行动者关键算法
3. Tunnel ventilation control via an actor-critic algorithm employing nonparametric policy gradients [J] . Baeksuk Chu, Daehie Hong, Jooyoung Park Journal of Mechanical Science and Technology . 2009 ,第2期

机译：通过采用非参数策略梯度的主演算法进行隧道通风控制
4. AC2: A Policy Gradient Actor with Primary and Secondary Critics [C] . Alfonso B. Labao, Prospero C. Naval International Joint Conference on Neural Networks . 2018

机译：AC2：具有主要和二级评论家的政策渐变演员
5. Language policy formulation and implementation in the South African Apartheid State: Mother tongue and Afrikaans as media of instruction in Black primary and secondary schools, 1953-1979 [D] . November, Melvyn Douglas 1991

机译：南非种族隔离国家的语言政策制定和实施：母语和南非荷兰语作为黑人中小学的教学媒体，1953-1979年
6. Policy-Gradient and Actor-Critic Based State Representation Learning for Safe Driving of Autonomous Vehicles [O] . Abhishek Gupta, Ahmed Shaharyar Khwaja, Alagan Anpalagan, 2020

机译：基于政策梯度和演员批评的国家代表性学习自主车辆安全驾驶
7. Bayesian Policy Gradient and Actor-Critic Algorithms [O] . Ghavamzadeh Mohammad, Engel Yaakov, Valko Michal 2016

机译：贝叶斯策略梯度和actor-Critic算法

AC2: A Policy Gradient Actor with Primary and Secondary Critics

摘要

著录项

相似文献

相关主题

期刊订阅