The present invention is applicable to the technical field of power automation control, and provides a multi-agent deep reinforcement learning proxy method based on an intelligent grid. The method comprises: S1, calculating a corresponding action standard value under a current state according to a selected action, and updating a parameter of a neural network; S2, establishing an "external competition, internal cooperation" multi-agent proxy according to the type of a consumer and a producer; S3, setting a reward function of each internal agent according to the profit maximization of the action of the agent and the interests of other internal agents. An input layer of the neural network can accept a direct input of a value of a feature of a depiction state, and Q-table needs to discretize the feature value to reduce the state space.
展开▼