Reinforcement learning method, reinforcement learning program, and reinforcement learning device
展开▼
机译:强化学习方法,强化学习程序和强化学习装置
展开▼
页面导航
摘要
著录项
相似文献
摘要
PROBLEM TO BE SOLVED: To improve learning efficiency by reinforcement learning. SOLUTION: A value function learning unit 403 performs a unit learning step, and learns a value function based on the received state of the wind power generation facility 400, the reward of the wind power generation facility 400, and the action to the wind power generation facility 400. To do. The experience level calculation unit 404 updates the experience level function based on the received state of the wind power generation facility 400, the reward of the wind power generation facility 400, and the action on the wind power generation facility 400. The experience degree calculation unit 404 calculates the experience degree of the current state or action of the wind power generation facility 400 and the experience degree of another state or action based on the experience degree function. The value function correction unit 405 determines whether to further update the value function based on the value function and the experience level. When determining that the value function is to be updated, the value function correction unit 405 uses monotonicity to update the value function based on the value function and the experience level. [Selection diagram] Fig. 4
展开▼