首页> 外文会议>International Joint Conference on Neural Networks >Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
【24h】

Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay

机译:嘈杂的重要性抽样演员批评:具有经验重播的非政策演员批评

获取原文

摘要

This paper presents Noisy Importance Sampling Actor-Critic (NISAC), a set of empirically validated modifications to the advantage actor-critic algorithm (A2C), allowing off-policy reinforcement learning and increased performance. NISAC uses additive action space noise, aggressive truncation of importance sample weights, and large batchsizes. We see that additive noise drastically changes how off-sample experience is weighted for policy updates. The modified algorithm achieves an increase in convergence speed and sample efficiency compared to both the on-policy actor-critic A2C and the importance weighted off-policy actor-critic algorithm. In comparison to state-of-the-art (SOTA) methods, such as actor-critic with experience replay (ACER), NISAC nears the performance on several of the tested environments while training 40% faster and being significantly easier to implement. The effectiveness of NISAC is demonstrated against existing on-policy and off-policy actor-critic algorithms on a subset of the Atari domain.
机译:本文介绍了噪声重要性抽样演员批评(NISAC),这是对利益演员批评算法(A2C)的一组经过经验验证的修改,可以进行非政策强化学习并提高性能。 NISAC使用加性动作空间噪声,重要样本权重的主动截断和大批量。我们看到,附加噪声极大地改变了对非样本体验进行加权以进行策略更新的方式。与按策略执行者评论的A2C和基于重要性加权的按策略执行者评论的算法相比,改进后的算法可提高收敛速度和采样效率。与最先进的(SOTA)方法(例如具有经验重演的演员评论员(ACER))相比,NISAC在一些测试环境中的性能接近,同时训练速度提高了40%,并且易于实施。相对于Atari域子集上现有的基于策略的和基于策略的行为者批评算法,证明了NISAC的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号