首页>
外国专利>
EXPERIENCE REINFORCEMENT TYPE REINFORCEMENT LEARNING SYSTEM, EXPERIENCE REINFORCEMENT TYPE REINFORCEMENT LEARNING METHOD AND EXPERIENCE REINFORCEMENT TYPE REINFORCEMENT LEARNING PROGRAM
EXPERIENCE REINFORCEMENT TYPE REINFORCEMENT LEARNING SYSTEM, EXPERIENCE REINFORCEMENT TYPE REINFORCEMENT LEARNING METHOD AND EXPERIENCE REINFORCEMENT TYPE REINFORCEMENT LEARNING PROGRAM
PROBLEM TO BE SOLVED: To provide an experience reinforcement type reinforcement learning system or the like capable of suppressing a large influence on a learning result at which learning which avoids punishment can obtain reward.SOLUTION: The experience reinforcement type reinforcement learning system includes: a state recognition means 1 for recognizing the state of an agent A; a rule selection means 2 for selecting a selectable state/action rule on the basis of an evaluation value; a reward evaluation value reinforcement means 3 for defining the series of all the state/action rules selected when the reward is obtained as an episode and reinforcing reward evaluation values of all the state/action rules of the episode altogether by weight for the reward; a punishment evaluation value reinforcement means 4 for defining the series of all the state/action rules selected when punishment is received as an episode and reinforcing the punishment evaluation values of all the state/action rules of the episode altogether by weight for the punishment; and an evaluation value operation means 5 for obtaining an evaluation value Q by a function expression: Q=Q(q[+], q[-]) when the reward evaluation value is defined as q[+] and the punishment evaluation value is defined as q[-].
展开▼