首页> 外国专利> METHOD AND APPARATUS OF QUANTIFYING RELIABILITY OF LATENT POLICY, EFFICIENCY OF EPISODIC ENCODING, AND TASK GENERALIZABILITY FOR DEVELOPING HUMAN-LIKE REINFORCEMENT LEARNING MODEL

METHOD AND APPARATUS OF QUANTIFYING RELIABILITY OF LATENT POLICY, EFFICIENCY OF EPISODIC ENCODING, AND TASK GENERALIZABILITY FOR DEVELOPING HUMAN-LIKE REINFORCEMENT LEARNING MODEL

机译：量化潜在策略的可靠性、情景编码的效率和任务可概括性的方法和装置，用于开发类人强化学习模型

页面导航

摘要
著录项
相似文献

摘要

A method and apparatus for quantifying policy reliability, information processing efficiency, and generalization ability for generalizable human-like reinforcement learning algorithm design are presented. The quantification method for designing a generalizable human-simulating reinforcement learning model performed through a computer according to an embodiment is derived through reverse reinforcement learning in order to transfer the generalization ability of the human reinforcement learning process to the reinforcement learning model. A policy reliability quantification step of quantifying how much the reinforcement learning model reflects a change in the context of a task to a policy, wherein the policy reliability quantification step includes a mapping function between the task parameter of the task and a human behavior profile approximating ; approximating a mapping function between the task parameter and the behavioral profile of a reinforcement learning algorithm; and comparing the approximated two mapping functions.

机译：提出了一种用于量化策略可靠性、信息处理效率和泛化能力的方法和装置，用于泛化类人强化学习算法设计。根据一个实施例，用于设计通过计算机执行的可概括的仿人强化学习模型的量化方法通过反向强化学习导出，以便将人类强化学习过程的概括能力转移到强化学习模型。策略可靠性量化步骤，用于量化强化学习模型在多大程度上反映任务到策略的上下文中的变化，其中策略可靠性量化步骤包括任务的任务参数和近似的人类行为简档之间的映射函数；逼近强化学习算法的任务参数和行为特征之间的映射函数；比较两个近似的映射函数。

著录项

公开/公告号KR20220043509A

专利类型
公开/公告日2022-04-05

原文格式PDF
申请/专利权人 한국과학기술원;
展开▼

申请/专利号KR20200126999
发明设计人 이상완;김동재;신재훈;
展开▼

申请日2020-09-29
分类号G06N20;
国家 KR
入库时间 2022-08-25 00:36:43

相似文献

专利
外文文献
中文文献