Implicit incremental natural actor critic algorithm

Iwaki Ryo; Asada Minoru

首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Implicit incremental natural actor critic algorithm

【24h】

Implicit incremental natural actor critic algorithm

机译：隐式增量自然actor批评算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Natural policy gradient (NPG) methods are promising approaches to finding locally optimal policy parameters. The NPG approach works well in optimizing complex policies with high-dimensional parameters, and the effectiveness of NPG methods has been demonstrated in many fields. However, the incremental estimation of the NPG is computationally unstable owing to its high sensitivity to the step-sizes values, especially to the one used to update the estimate of NPG. In this study, we propose a new incremental and stable algorithm for the NPG estimation. We call the proposed algorithm the implicit incremental natural actor critic (I2NAC), and it is based on the idea of the implicit update. The convergence analysis for I2NAC is provided. Theoretical analysis results indicate the stability of I2NAC and the instability of conventional incremental NPG methods. Numerical experiments were performed, and the results show that I2NAC is less sensitive to the values of the meta-parameters, including the step-size for the NPG update, compared to the existing incremental NPG method. (C) 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license.

机译：自然政策梯度（NPG）方法是寻找局部最佳政策参数的有希望的方法。 NPG方法在优化具有高维参数的复杂策略方面，并且在许多领域中已经证明了NPG方法的有效性。然而，由于其对阶梯大小值的高敏感性，因此，NPG的增量估计是计算地不稳定，尤其是用于更新NPG估计的人。在这项研究中，我们提出了一种新的增量和稳定的NPG估计算法。我们称之为建议的算法隐式增量自然actor评论家（i2nac），它基于隐式更新的想法。提供了I2NAC的收敛性分析。理论分析结果表明I2NAC的稳定性和常规增量NPG方法的不稳定性。进行了数值实验，结果表明，与现有的增量NPG方法相比，I2NAC对元参数的值不太敏感，包括NPG更新的步骤尺寸。（c）2018作者。由elestvier有限公司发布这是CC By-NC-ND许可下的开放式访问。

著录项

来源
《Neural Networks: The Official Journal of the International Neural Network Society》 |2019年第2019期|共10页
作者
Iwaki Ryo; Asada Minoru;
展开▼
作者单位

Osaka Univ 2-1 Yamadaoka Suita Osaka Japan;

Osaka Univ 2-1 Yamadaoka Suita Osaka Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类神经病学;
关键词
Reinforcement learning; Natural policy gradient; Natural actor critic; Incremental learning; Implicit update;

机译：强化学习;自然政策梯度;自然演员评论家;增量学习;隐含更新;

相似文献

外文文献
中文文献
专利

1. Implicit incremental natural actor critic algorithm [J] . Iwaki Ryo, Asada Minoru Neural Networks: The Official Journal of the International Neural Network Society . 2019,第期

机译：隐式增量自然actor批评算法
2. Natural actor-critic algorithms [J] . Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Automatica . 2009,第11期

机译：自然演员批评算法
3. Tunnel ventilation controller design using an RLS-based natural actor-critic algorithm [J] . Baeksuk Chu, Jooyoung Park, Daehie Hong International Journal of Precision Engineering and Manufacturing . 2010,第6期

机译：使用基于RLS的自然行为者评论算法的隧道通风控制器设计
4. Implicit Incremental Natural Actor Critic [C] . Ryo Iwaki, Minora Asada International conference on neural information processing . 2017

机译：隐式增量自然演员评论家
5. A Bounded Actor-Critic Algorithm for Reinforcement Learning [D] . Lawhead, Ryan Jacob. 2017

机译：一种有限于钢筋学习的批评算法
6. Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning [O] . Shan Zhong, Quan Liu, QiMing Fu 2016

机译：具有分层模型学习和计划的高效Actor-Critic算法
7. Natural Actor-Critic Algorithms [O] . Bhatnagar, Shalabh, Sutton, Richard, Ghavamzadeh, Mohammad, 2009

机译：自然演员批评算法

Implicit incremental natural actor critic algorithm

摘要

著录项

相似文献

相关主题

期刊订阅