...
【24h】

Implicit incremental natural actor critic algorithm

机译:隐式增量自然actor批评算法

获取原文
获取原文并翻译 | 示例
           

摘要

Natural policy gradient (NPG) methods are promising approaches to finding locally optimal policy parameters. The NPG approach works well in optimizing complex policies with high-dimensional parameters, and the effectiveness of NPG methods has been demonstrated in many fields. However, the incremental estimation of the NPG is computationally unstable owing to its high sensitivity to the step-sizes values, especially to the one used to update the estimate of NPG. In this study, we propose a new incremental and stable algorithm for the NPG estimation. We call the proposed algorithm the implicit incremental natural actor critic (I2NAC), and it is based on the idea of the implicit update. The convergence analysis for I2NAC is provided. Theoretical analysis results indicate the stability of I2NAC and the instability of conventional incremental NPG methods. Numerical experiments were performed, and the results show that I2NAC is less sensitive to the values of the meta-parameters, including the step-size for the NPG update, compared to the existing incremental NPG method. (C) 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license.
机译:自然政策梯度(NPG)方法是寻找局部最佳政策参数的有希望的方法。 NPG方法在优化具有高维参数的复杂策略方面,并且在许多领域中已经证明了NPG方法的有效性。然而,由于其对阶梯大小值的高敏感性,因此,NPG的增量估计是计算地不稳定,尤其是用于更新NPG估计的人。在这项研究中,我们提出了一种新的增量和稳定的NPG估计算法。我们称之为建议的算法隐式增量自然actor评论家(i2nac),它基于隐式更新的想法。提供了I2NAC的收敛性分析。理论分析结果表明I2NAC的稳定性和常规增量NPG方法的不稳定性。进行了数值实验,结果表明,与现有的增量NPG方法相比,I2NAC对元参数的值不太敏感,包括NPG更新的步骤尺寸。 (c)2018作者。由elestvier有限公司发布这是CC By-NC-ND许可下的开放式访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号