Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay

机译：嘈杂的重要性抽样演员批评：具有经验重播的非政策演员批评

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents Noisy Importance Sampling Actor-Critic (NISAC), a set of empirically validated modifications to the advantage actor-critic algorithm (A2C), allowing off-policy reinforcement learning and increased performance. NISAC uses additive action space noise, aggressive truncation of importance sample weights, and large batchsizes. We see that additive noise drastically changes how off-sample experience is weighted for policy updates. The modified algorithm achieves an increase in convergence speed and sample efficiency compared to both the on-policy actor-critic A2C and the importance weighted off-policy actor-critic algorithm. In comparison to state-of-the-art (SOTA) methods, such as actor-critic with experience replay (ACER), NISAC nears the performance on several of the tested environments while training 40% faster and being significantly easier to implement. The effectiveness of NISAC is demonstrated against existing on-policy and off-policy actor-critic algorithms on a subset of the Atari domain.

机译：本文介绍了噪声重要性抽样演员批评（NISAC），这是对利益演员批评算法（A2C）的一组经过经验验证的修改，可以进行非政策强化学习并提高性能。 NISAC使用加性动作空间噪声，重要样本权重的主动截断和大批量。我们看到，附加噪声极大地改变了对非样本体验进行加权以进行策略更新的方式。与按策略执行者评论的A2C和基于重要性加权的按策略执行者评论的算法相比，改进后的算法可提高收敛速度和采样效率。与最先进的（SOTA）方法（例如具有经验重演的演员评论员（ACER））相比，NISAC在一些测试环境中的性能接近，同时训练速度提高了40％，并且易于实施。相对于Atari域子集上现有的基于策略的和基于策略的行为者批评算法，证明了NISAC的有效性。

著录项

来源
《International Joint Conference on Neural Networks》|2020年|1-8|共8页
会议地点
作者
Norman Tasfi; Miriam Capretz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Monte Carlo methods; Noise measurement; Training; Trajectory; Additives; Learning (artificial intelligence); Additive noise;

机译：蒙特卡罗方法;噪声测量;训练;轨迹;添加剂;学习（人工智能）;添加剂噪声;

相似文献

外文文献
中文文献
专利

1. SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY [J] . Yan Tao, Zhang Wenan, Yang Simon X., International Journal of Robotics & Automation . 2019,第5期

机译：软电演位批评机器人机器人与后勤体验重播的批评
2. Real-time reinforcement learning by sequential Actor-Critics and experience replay. [J] . Wawrzynski P Neural Networks: The Official Journal of the International Neural Network Society . 2009,第10期

机译：通过连续的Actor-Critics进行实时强化学习，并体验回放。
3. A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning [J] . Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, IFAC PapersOnLine . 2020,第2期

机译：用于分布式强化学习的多功能脱机演员 - 批评算法
4. Off-Policy Actor-Critic with Shared Experience Replay [C] . Simon Schmitt, Matteo Hessel, Karen Simonyan International Conference on Machine Learning . 2021

机译：违规行动者 - 评论家共享体验重放
5. Mars: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler [D] . Baheri, Betis. 2020

机译：火星：多可扩展的演员 - 评论家强化学习调度员
6. Characterizing Motor Control of Mastication With Soft Actor-Critic [O] . Amir H. Abdi, Benedikt Sagl, Venkata P. Srungarapu, 2020

机译：用软演员-临界特征表征咀嚼的运动控制
7. Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors [O] . Jingliang Duan, Yang Guan, Shengbo Eben Li, 2021

机译：分布软演员 - 评论家：解决价值估计错误的禁止策略加固学习

Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay

摘要

著录项

相似文献

相关主题

期刊订阅