Policy Gradients with Parameter-Based Exploration for Control

机译：基于策略的基于参数的控制策略梯度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE. For several complex control tasks, including robust standing with a humanoid robot, we show that our method outperforms well-known algorithms from the fields of policy gradients, finite difference methods and population based heuristics. We also provide a detailed analysis of the differences between our method and the other algorithms.

机译：我们针对部分可观察到的马尔可夫决策问题提出了一种无模型的强化学习方法。我们的方法通过直接在参数空间中采样来估计似然梯度，这导致方差梯度估计值比通过策略梯度方法（如REINFORCE）获得的方差梯度估计值低。对于一些复杂的控制任务，包括使用人形机器人的稳健站立，我们证明了我们的方法在策略梯度，有限差分方法和基于人口的启发式算法等领域的表现优于知名算法。我们还提供了对我们的方法与其他算法之间差异的详细分析。

著录项

来源
《International Conference on Artificial Neural Networks;ICANN 2008》|2008年|P.387-396|共10页
会议地点
作者
Frank Sehnke; Christian Osendorfer; Thomas Rueckstiess; Alex Graves; Jan Peters; Juergen Schmidhuber;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation [J] . VootTangkaratt, Syogo Mori, Tingting Zhao, Neural Networks: The Official Journal of the International Neural Network Society . 2014,第Null期

机译：最小二乘条件密度估计的基于模型的策略梯度与基于参数的探索
2. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation [J] . Syogo Mori, Voot Tangkaratt, Tingting Zhao, 電子情報通信学会技術研究報告 . 2013,第454期

机译：最小二乘条件密度估计的基于模型的策略梯度与基于参数的探索
3. Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration [J] . Tingting Zhao, Hirotaka Hachiya, Voot Tangkaratt, Neural computation . 2013,第6期

机译：基于参数的探索可有效地在策略梯度中重用样本
4. Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks [C] . Atsushi Miyamae, Yuichi Nagata, Isao Ono, Annual conference on Neural Information Processing Systems . 2011

机译：用于控制任务的基于参数的自然政策梯度方法
5. Dynamic in-plane potential gradients for actively controlling electrochemical reactions: Part I. Characterization of 1- and 2-component alkanethiol monolayer gradients on thin gold films. Part II. Applications of in-plane potential gradients. [D] . Balss, Karin Maria. 2002

机译：用于主动控制电化学反应的动态面内电势梯度：第一部分。金薄膜上1和2组分烷硫醇单层梯度的表征。第二部分平面内电势梯度的应用。
6. Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking [O] . Chujun Liu, Andrew G. Lonsberry, Mark J. Nandor, 2019

机译：控制动态双足行走的深度确定性策略梯度的实现
7. Policy Gradients with Parameter-based Exploration for Control [O] . Frank Sehnke, Christian Osendorfer, Thomas Rückstieß, 2009

机译：策略梯度和基于参数的控制探索

Policy Gradients with Parameter-Based Exploration for Control

摘要

著录项

相似文献

相关主题

期刊订阅