Implicit Incremental Natural Actor Critic

机译：隐式增量自然演员评论家

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The natural policy gradient (NPG) method is a promising approach to find a locally optimal policy parameter. The NPG method has been demonstrated remarkable successes in many fields, including the large scale applications. On the other hand, the estimation of the NPG itself requires a enormous amount of samples. Furthermore, incremental estimation of the NPG is computationally unstable. In this work, we propose a new incremental and stable algorthm for the NPG estimation. The proposed algorithm is based on the idea of implicit temporal differences, and we call the proposed one implicit incremental natural actor critic (I2NAC). Theoretical analysis indicates the stability of I2NAC and the instability of conventional incremental NPG methods. Numerical experiment shows that I2NAC is less sensitive to the value of step sizes.

机译：自然政策梯度法（NPG）是一种寻找局部最优政策参数的有前途的方法。 NPG方法已在包括大规模应用在内的许多领域取得了令人瞩目的成功。另一方面，NPG本身的估计需要大量的样本。此外，NPG的增量估计在计算上是不稳定的。在这项工作中，我们为NPG估算提出了一个新的增量算法和稳定算法。所提出的算法基于隐式时间差异的思想，我们将所提出的算法称为一个隐式增量自然演员评论家（I2NAC）。理论分析表明，I2NAC的稳定性和常规增量NPG方法的不稳定性。数值实验表明，I2NAC对步长值不太敏感。

著录项

来源
《International conference on neural information processing》|2017年|749-758|共10页
会议地点
作者
Ryo Iwaki; Minora Asada;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Reinforcement learning; Natural policy gradient; Incre-mental natural actor critic; Incremental learning; Implicit update;

机译：强化学习;自然政策梯度;自然演员评论家的增量;增量学习;隐式更新;

相似文献

外文文献
中文文献
专利

1. Implicit incremental natural actor critic algorithm [J] . Iwaki Ryo, Asada Minoru Neural Networks: The Official Journal of the International Neural Network Society . 2019,第期

机译：隐式增量自然actor批评算法
2. Incremental Receptive Field Weighted Actor-Critic [J] . Lee D.-H., Lee J.-J. Industrial Informatics, IEEE Transactions on . 2013,第1期

机译：增量接收场加权演员临界
3. Efficient data use in incremental actor-critic algorithms [J] . Yuhu Cheng, Huanting Feng, Xuesong Wang Neurocomputing . 2013,第sepa20期

机译：在增量执行者批判算法中有效使用数据
4. Implicit Incremental Natural Actor Critic [C] . Ryo Iwaki, Minoru Asada International Conference on Neural Information Processing . 2017

机译：隐含增量自然演员评论家
5. Mars: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler [D] . Baheri, Betis. 2020

机译：火星：多可扩展的演员 - 评论家强化学习调度员
6. Humanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture [O] . Cai Li, Robert Lowe, Tom Ziemke 2013

机译：类人动物学会走路：自然的CPG-演员-批评式建筑
7. Fitted natural actor-critic: A new algorithm for continuous state-action MDPs [O] . Francisco S. Melo, Manuel Lopes 2016

机译：适合自然演员评论：连续状态动作mDp的新算法

Implicit Incremental Natural Actor Critic

摘要

著录项

相似文献

相关主题

期刊订阅