首页>
外国专利>
Method for improving policy, method for improving policy, and device improving apparatus
Method for improving policy, method for improving policy, and device improving apparatus
展开▼
机译:改进政策的方法,改进政策的方法,以及设备改进装置
展开▼
页面导航
摘要
著录项
相似文献
摘要
PROBLEM TO BE SOLVED: To generate a feedback coefficient matrix that provides a policy for optimizing an accumulated cost or an accumulated reward.;SOLUTION: A change in the state of a control target 110 is defined by a linear difference equation, and an immediate cost or an immediate reward for the control target 110 is defined by the state of the control target 110 and the quadratic form of an input of the control target 110. A policy improvement device 100 generates a TD error with respect to an estimated state value function obtained by estimating a state value function by perturbing each component of a feedback coefficient matrix that gives a policy. The policy improvement device 100 generates an estimated gradient function matrix that estimates a gradient function matrix of the state value function related to the feedback coefficient matrix for the state based on the TD error and the perturbation. The policy improvement device 100 updates the feedback coefficient matrix using the generated estimated gradient function matrix.;SELECTED DRAWING: Figure 1;COPYRIGHT: (C)2019,JPO&INPIT
展开▼