The Actor-Dueling-Critic Method for Reinforcement Learning

机译：强化学习的演员-决斗批评方法

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Model-free reinforcement learning is a powerful and efficient machine-learning paradigm which has been generally used in the robotic control domain. In the reinforcement learning setting, the value function method learns policies by maximizing the state-action value (Q value), but it suffers from inaccurate Q estimation and results in poor performance in a stochastic environment. To mitigate this issue, we present an approach based on the actor-critic framework, and in the critic branch we modify the manner of estimating Q-value by introducing the advantage function, such as dueling network, which can estimate the action-advantage value. The action-advantage value is independent of state and environment noise, we use it as a fine-tuning factor to the estimated Q value. We refer to this approach as the actor-dueling-critic (ADC) network since the frame is inspired by the dueling network. Furthermore, we redesign the dueling network part in the critic branch to make it adapt to the continuous action space. The method was tested on gym classic control environments and an obstacle avoidance environment, and we design a noise environment to test the training stability. The results indicate the ADC approach is more stable and converges faster than the DDPG method in noise environments.

机译：无模型强化学习是一种强大而高效的机器学习范例，已广泛用于机器人控制领域。在强化学习设置中，价值函数方法通过最大化状态作用值（Q值）来学习策略，但是它会受到Q估计不准确的困扰，并且在随机环境中会导致性能下降。为了缓解这个问题，我们提出了一种基于参与者批评框架的方法，在批评者分支中，我们通过引入诸如对决网络之类的优势函数来修改估算Q值的方式，该函数可以估算出行动优势值。。行动优势值与状态噪声和环境噪声无关，我们将其用作对估算Q值的微调因子。由于帧是由决斗网络启发的，因此我们将这种方法称为“演员-决斗批评家（ADC）”网络。此外，我们重新设计了注释器分支中的决斗网络部分，以使其适应连续动作空间。在健身房经典控制环境和避障环境下对该方法进行了测试，并设计了噪声环境以测试训练的稳定性。结果表明，在噪声环境中，ADC方法比DDPG方法更稳定并且收敛速度更快。

著录项

期刊名称 Sensors (Basel Switzerland)
作者
Menghao Wu; Yanbin Gao; Alexander Jung; Qiang Zhang; Shitong Du;
展开▼
作者单位

展开▼
年(卷),期 2019(19),7
年度 2019
页码 1547
总页数 20
原文格式 PDF
正文语种
中图分类
关键词
reinforcement learning continuous control DDPG dueling network advantage;

机译：强化学习;连续控制;DDPG;决斗网络;优势;

相似文献

外文文献
中文文献
专利

1. Reinforcement Learning zur Planung von Arbeitsprozessen: Anwendung von Reinforcement Learning Methoden zur Planung von Arbeitsaufgaben im industriellen Bereich [J] . Helge Ulo Dinkelbach, Julia Schuster, Fred H. Hamker Industrie management . 2015,第1期

机译：用于计划工作流程的强化学习：将强化学习方法应用于工业部门的计划任务
2. Characterizing reinforcement learning methods through parameterized learning problems [J] . Shivaram Kalyanakrishnan, Peter Stone Machine Learning . 2011,第1a2期

机译：通过参数化学习问题表征强化学习方法
3. A Reinforcement Learning Method Using a Dynamic Reinforcement Function Based on Action Selection Probability [J] . Yugo Hasegawa, Satoko Takada, Hidehiro Nakano, Systems and Computers in Japan . 2007,第7期

机译：基于动作选择概率的动态强化函数强化学习方法
4. A comparison of supervised and reinforcement learning methods on a reinforcement learning task [C] . Gullapalli, V. Intelligent Control, 1991., Proceedings of the 1991 IEEE International Symposium on . 1991

机译：强化学习任务的监督学习和强化学习方法比较
5. Methods and Applications of Deep Reinforcement Learning for Chemical Processes [D] . Hubbs, Christian D. 2021

机译：深增强学习的化学过程的方法和应用
6. Dynamic Camera Reconfiguration with Reinforcement Learning and Stochastic Methods for Crowd Surveillance [O] . Niccolò Bisagno, Alberto Xamin, Francesco De Natale, 2020

机译：用加固学习和随机监测的动态摄像机重新配置
7. A Comparison Of Supervised And Reinforcement Learning Methods On A Reinforcement Learning Task [O] . Vijaykumar Gullapalli 1992

机译：强化学习任务中监督学习和强化学习方法的比较

The Actor-Dueling-Critic Method for Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅