首页> 外国专利> ACTIVE IMITATION LEARNING IN HIGH DIMENSIONAL CONTINUOUS ENVIRONMENTS

ACTIVE IMITATION LEARNING IN HIGH DIMENSIONAL CONTINUOUS ENVIRONMENTS

机译：高维连续环境中的主动模仿学习

页面导航

摘要
著录项
相似文献

摘要

According to one embodiment, a computer-implemented method for active, imitation learning, includes: providing training data comprising an expert trajectory to a processor; querying the expert trajectory during an iterative, active learning process; generating a decision policy based at least in part on the expert trajectory and a result of querying the expert trajectory; attempting to distinguish the decision policy from the expert trajectory; in response to distinguishing the decision policy from the expert trajectory, outputting a policy update and generating a new decision policy based at least in part on the policy update; and in response to not distinguishing the decision policy from the expert trajectory, outputting the decision policy. Importantly, the expert trajectory is queried for only a subset of iterations of the iterative, active learning process, wherein the most uncertain state/action pair(s) from the expert trajectory are determined using one or more disagreement functions.

机译：根据一个实施例，一种用于主动模仿学习的计算机实现的方法包括：向处理器提供包括专家轨迹的训练数据;在迭代的主动学习过程中查询专家轨迹;至少部分地基于专家轨迹和查询专家轨迹的结果来生成决策策略;试图将决策策略与专家轨迹区分开来;响应于将决策策略与专家轨迹区分开，输出策略更新并至少部分地基于策略更新来生成新的决策策略;并且响应于不将决策策略与专家轨迹区分开来，输出决策策略。重要的是，仅查询迭代的主动学习过程的迭代子集的专家轨迹，其中，使用一个或多个分歧函数来确定来自专家轨迹的最不确定状态/动作对。

著录项

公开/公告号US2020082257A1

专利类型
公开/公告日2020-03-12

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US201816124138
发明设计人 MU QIAO;DYLAN J. FITZPATRICK;DIVYESH JADAV;
展开▼

申请日2018-09-06
分类号G06N3/08;G06N3/04;
国家 US
入库时间 2022-08-21 11:25:00

相似文献

专利
外文文献
中文文献