In this paper, we propose a new algorithm of anudadaptive actor-critic method with multi-step simulated experiences,udas a kind of temporal difference (TD) method. Inudour approach, the TD-error is composed of two valuefunctionsudand m utility functions, where m denotes theudnumber ofmulti-steps inwhich the experience should be simulated.udThe value-function is constructed from the critic formulatedudby a radial basis function neural network (RBFNN),udwhich has a simulated experience as an input, generated fromuda predictive model based on a kinematic model. Thus, sinceudour approach assumes that the model is available to simulateudthe m-step experiences and to design a controller, suchuda kinematic model is also applied to construct the actor andudthe resultant model based actor (MBA) is also regarded as audnetwork, i.e., it is just viewed as a resolved velocity controludnetwork. We implement this approach to control nonholonomicudmobile robot, especially in a trajectory tracking controludproblem for the position coordinates and azimuth. Someudsimulations show the effectiveness of the proposed methodudfor controlling a mobile robot with two-independent drivingudwheels.
展开▼