首页> 外文会议>IEEE International Conference on Systems >More on training strategies for critic and action neural networks in dual heuristic programming method
【24h】

More on training strategies for critic and action neural networks in dual heuristic programming method

机译:更多关于批评双发主义编程方法批评和行动神经网络的培训策略

获取原文

摘要

The article describes a modification to the usual procedures for training of critic and action neural networks in the dual heuristic programming (DHP) method (D. Prokhorov and D. Wunsch, 1996; R. Santiago, 1995; P. Werbos, 1994). This modification entails updating both the critic and the action networks at each computational cycle, rather than only one at a time. The distinction lies in the introduction of a (real) second copy of the critic network whose weights are adjusted less often and the "desired value" for training the other critic is obtained from this critic copy. Previously (G. Lendaris and C. Paintz, 1997), the proposed modified training strategy was demonstrated on the pole cart controller problem: the full 6 dimensional state vector was input to the critic and action NNs, however, the utility function only involved pole angle, not distance along the track (x). For the first set of results presented here, the 3 states associated with the x variable were eliminated from the inputs to the NNs, keeping the same utility function previously defined. This resulted in improved learning and controller performance. From this point, the method is applied to two additional problems, each of increasing complexity: for the first, an x-related term is added to the utility function for the pole cart problem, and simultaneously, the x-related states were added back in to the NNs (i.e., increase number of state variables used from 3 to 6); the second relates to steering a vehicle with independent drive motors on each wheel. The problem contexts and experimental results are provided.
机译:本文介绍了对双发主义编程(DHP)方法(D.Prokhorov和D. Wunsch,1996; R. Santiago,1995; P.Werbos,1994)的批评批评批评和行动神经网络培训常规程序的修改。此修改需要在每个计算周期中更新批评者和动作网络,而不是一次只有一个。这些区别在于引入(Real)的批评网络的第二份批评网络,其权重较少调整,并且从这个评论家副本获得了培训其他评论家的“期望值”。以前(G. Lendaris和C. Paintz,1997),在杆子推车控制器问题上证明了所提出的修改训练策略:完整的6维州矢量被输入到批评者和动作NNS,但是,公用事业功能仅涉及杆角度,不是沿轨道的距离(x)。对于此处呈现的第一组结果,从输入到NNS的输入中消除了与X变量相关联的3个状态,保持先前定义的相同的实用程序功能。这导致了改进的学习和控制器性能。从这一点来看,该方法应用于两个额外问题,每一个复杂性:首先,将X相关术语添加到杆推车问题的实用功能,同时,加回X相关状态进入NNS(即,增加3到6所使用的状态变量的数量;第二个涉及在每个轮上操纵具有独立驱动电动机的车辆。提供了问题背景和实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号