首页> 外国专利> ACTIVE IMITATION LEARNING IN HIGH DIMENSIONAL CONTINUOUS ENVIRONMENTS

ACTIVE IMITATION LEARNING IN HIGH DIMENSIONAL CONTINUOUS ENVIRONMENTS

机译:高维连续环境中的主动模仿学习

摘要

According to one embodiment, a computer-implemented method for active, imitation learning, includes: providing training data comprising an expert trajectory to a processor; querying the expert trajectory during an iterative, active learning process; generating a decision policy based at least in part on the expert trajectory and a result of querying the expert trajectory; attempting to distinguish the decision policy from the expert trajectory; in response to distinguishing the decision policy from the expert trajectory, outputting a policy update and generating a new decision policy based at least in part on the policy update; and in response to not distinguishing the decision policy from the expert trajectory, outputting the decision policy. Importantly, the expert trajectory is queried for only a subset of iterations of the iterative, active learning process, wherein the most uncertain state/action pair(s) from the expert trajectory are determined using one or more disagreement functions.
机译:根据一个实施例,一种用于主动模仿学习的计算机实现的方法包括:向处理器提供包括专家轨迹的训练数据;在迭代的主动学习过程中查询专家轨迹;至少部分地基于专家轨迹和查询专家轨迹的结果来生成决策策略;试图将决策策略与专家轨迹区分开来;响应于将决策策略与专家轨迹区分开,输出策略更新并至少部分地基于策略更新来生成新的决策策略;并且响应于不将决策策略与专家轨迹区分开来,输出决策策略。重要的是,仅查询迭代的主动学习过程的迭代子集的专家轨迹,其中,使用一个或多个分歧函数来确定来自专家轨迹的最不确定状态/动作对。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号