首页> 外文会议>International Conference on Artificial Neural Networks;ICANN 2008 >Policy Gradients with Parameter-Based Exploration for Control
【24h】

Policy Gradients with Parameter-Based Exploration for Control

机译:基于策略的基于参数的控制策略梯度

获取原文

摘要

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE. For several complex control tasks, including robust standing with a humanoid robot, we show that our method outperforms well-known algorithms from the fields of policy gradients, finite difference methods and population based heuristics. We also provide a detailed analysis of the differences between our method and the other algorithms.
机译:我们针对部分可观察到的马尔可夫决策问题提出了一种无模型的强化学习方法。我们的方法通过直接在参数空间中采样来估计似然梯度,这导致方差梯度估计值比通过策略梯度方法(如REINFORCE)获得的方差梯度估计值低。对于一些复杂的控制任务,包括使用人形机器人的稳健站立,我们证明了我们的方法在策略梯度,有限差分方法和基于人口的启发式算法等领域的表现优于知名算法。我们还提供了对我们的方法与其他算法之间差异的详细分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号