首页>
外国专利>
ACTIVE IMITATION LEARNING IN HIGH DIMENSIONAL CONTINUOUS ENVIRONMENTS
ACTIVE IMITATION LEARNING IN HIGH DIMENSIONAL CONTINUOUS ENVIRONMENTS
展开▼
机译:高维连续环境中的主动模仿学习
展开▼
页面导航
摘要
著录项
相似文献
摘要
According to one embodiment, a computer-implemented method for active, imitation learning, includes: providing training data comprising an expert trajectory to a processor; querying the expert trajectory during an iterative, active learning process; generating a decision policy based at least in part on the expert trajectory and a result of querying the expert trajectory; attempting to distinguish the decision policy from the expert trajectory; in response to distinguishing the decision policy from the expert trajectory, outputting a policy update and generating a new decision policy based at least in part on the policy update; and in response to not distinguishing the decision policy from the expert trajectory, outputting the decision policy. Importantly, the expert trajectory is queried for only a subset of iterations of the iterative, active learning process, wherein the most uncertain state/action pair(s) from the expert trajectory are determined using one or more disagreement functions.
展开▼