首页>
外国专利>
METHOD AND APPARATUS OF QUANTIFYING RELIABILITY OF LATENT POLICY, EFFICIENCY OF EPISODIC ENCODING, AND TASK GENERALIZABILITY FOR DEVELOPING HUMAN-LIKE REINFORCEMENT LEARNING MODEL
METHOD AND APPARATUS OF QUANTIFYING RELIABILITY OF LATENT POLICY, EFFICIENCY OF EPISODIC ENCODING, AND TASK GENERALIZABILITY FOR DEVELOPING HUMAN-LIKE REINFORCEMENT LEARNING MODEL
展开▼
机译:量化潜在策略的可靠性、情景编码的效率和任务可概括性的方法和装置,用于开发类人强化学习模型
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method and apparatus for quantifying policy reliability, information processing efficiency, and generalization ability for generalizable human-like reinforcement learning algorithm design are presented. The quantification method for designing a generalizable human-simulating reinforcement learning model performed through a computer according to an embodiment is derived through reverse reinforcement learning in order to transfer the generalization ability of the human reinforcement learning process to the reinforcement learning model. A policy reliability quantification step of quantifying how much the reinforcement learning model reflects a change in the context of a task to a policy, wherein the policy reliability quantification step includes a mapping function between the task parameter of the task and a human behavior profile approximating ; approximating a mapping function between the task parameter and the behavioral profile of a reinforcement learning algorithm; and comparing the approximated two mapping functions.
展开▼