首页> 外文会议> >More on training strategies for critic and action neural networks in dual heuristic programming method

【24h】

More on training strategies for critic and action neural networks in dual heuristic programming method

机译：关于双重启发式编程方法的批评者和动作神经网络训练策略的更多信息

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The article describes a modification to the usual procedures for training of critic and action neural networks in the dual heuristic programming (DHP) method (D. Prokhorov and D. Wunsch, 1996; R. Santiago, 1995; P. Werbos, 1994). This modification entails updating both the critic and the action networks at each computational cycle, rather than only one at a time. The distinction lies in the introduction of a (real) second copy of the critic network whose weights are adjusted less often and the "desired value" for training the other critic is obtained from this critic copy. Previously (G. Lendaris and C. Paintz, 1997), the proposed modified training strategy was demonstrated on the pole cart controller problem: the full 6 dimensional state vector was input to the critic and action NNs, however, the utility function only involved pole angle, not distance along the track (x). For the first set of results presented here, the 3 states associated with the x variable were eliminated from the inputs to the NNs, keeping the same utility function previously defined. This resulted in improved learning and controller performance. From this point, the method is applied to two additional problems, each of increasing complexity: for the first, an x-related term is added to the utility function for the pole cart problem, and simultaneously, the x-related states were added back in to the NNs (i.e., increase number of state variables used from 3 to 6); the second relates to steering a vehicle with independent drive motors on each wheel. The problem contexts and experimental results are provided.

机译：这篇文章描述了在双重启发式编程（DHP）方法中对批评者和动作神经网络进行训练的常规程序的一种修改（D. Prokhorov和D. Wunsch，1996; R。Santiago，1995; P。Werbos，1994）。此修改需要在每个计算周期而不是一次仅更新评论者和动作网络。区别在于引入了（真实的）评论者网络的第二个副本，该网络的权重调整得不太频繁，并且从该评论者副本中获得了训练另一个评论者的“期望值”。以前（G. Lendaris和C. Paintz，1997），在杆车控制器问题上演示了所提出的改进的训练策略：将完整的6维状态向量输入到评论者和动作NN，但是，效用函数仅涉及杆角度，而不是沿轨道的距离（x）。对于此处显示的第一组结果，从NN的输入中消除了与x变量关联的3个状态，并保持了先前定义的效用函数。这导致了学习和控制器性能的提高。从这一点出发，该方法适用于两个其他问题，每个问题都越来越复杂：首先，将与x相关的项添加到极点车问题的效用函数中，同时将与x相关的状态加回去进入神经网络（即，将使用的状态变量的数量从3增加到6）;第二个方面涉及对每个车轮上具有独立驱动马达的车辆进行转向。提供了问题背景和实验结果。

著录项

来源
《》|1997年|P.3067-3072|共6页
会议地点
作者
Lendaris; G.G.; Paintz; C.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Fuel-Efficient Gear Shift and Power Split Strategy for Parallel HEVs Based on Heuristic Dynamic Programming and Neural Networks [J] . Li Guoqiang, Goerges Daniel IEEE Transactions on Vehicular Technology . 2019,第10期

机译：基于启发式动态规划和神经网络的混合动力混合动力汽车节油换挡和动力分配策略
2. Simple and Fast Calculation of the Second-Order Gradients for Globalized Dual Heuristic Dynamic Programming in Neural Networks [J] . Fairbank M., Alonso E., Prokhorov D. Neural Networks and Learning Systems, IEEE Transactions on . 2012,第10期

机译：神经网络中全局双重启发式动态规划的二阶梯度的简单快速计算
3. Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming [J] . Zhang H., Qin C., Luo Y. Automation Science and Engineering, IEEE Transactions on . 2014,第3期

机译：离散启发式非线性系统的双重启发式神经网络约束最优控制方案
4. Training strategies for critic and action neural networks in dual heuristic programming method [C] . Lendaris, G.G., Paintz, . 1997

机译：双重启发式编程方法的批评者和行动神经网络训练策略
5. Primal-dual techniques for nonlinear programming and applications to artificial neural network training [D] . Couellan, Nicolas P. 1997

机译：非线性规划的对偶技术及其在人工神经网络训练中的应用
6. SAGRAD: A Program for Neural Network Training with Simulated Annealing and the Conjugate Gradient Method [O] . Javier Bernal, Jose Torres-Jimenez 2015

机译：SAGRAD：一种具有模拟退火和共轭梯度法的神经网络训练程序
7. More on Training Strategies for Critic and Action Neural Networks in Dual Heuristic Programming Method [O] . George G. Lendaris, Christian Paintz, Thaddeus Shannon 1997

机译：双启发式编程方法中关于批评与行动神经网络训练策略的研究

More on training strategies for critic and action neural networks in dual heuristic programming method

摘要

著录项

相似文献

相关主题

期刊订阅