【24h】

Implicit Incremental Natural Actor Critic

机译:隐式增量自然演员评论家

获取原文

摘要

The natural policy gradient (NPG) method is a promising approach to find a locally optimal policy parameter. The NPG method has been demonstrated remarkable successes in many fields, including the large scale applications. On the other hand, the estimation of the NPG itself requires a enormous amount of samples. Furthermore, incremental estimation of the NPG is computationally unstable. In this work, we propose a new incremental and stable algorthm for the NPG estimation. The proposed algorithm is based on the idea of implicit temporal differences, and we call the proposed one implicit incremental natural actor critic (I2NAC). Theoretical analysis indicates the stability of I2NAC and the instability of conventional incremental NPG methods. Numerical experiment shows that I2NAC is less sensitive to the value of step sizes.
机译:自然政策梯度法(NPG)是一种寻找局部最优政策参数的有前途的方法。 NPG方法已在包括大规模应用在内的许多领域取得了令人瞩目的成功。另一方面,NPG本身的估计需要大量的样本。此外,NPG的增量估计在计算上是不稳定的。在这项工作中,我们为NPG估算提出了一个新的增量算法和稳定算法。所提出的算法基于隐式时间差异的思想,我们将所提出的算法称为一个隐式增量自然演员评论家(I2NAC)。理论分析表明,I2NAC的稳定性和常规增量NPG方法的不稳定性。数值实验表明,I2NAC对步长值不太敏感。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号