首页> 外国专利> CONTROLLING AN AGENT TO EXPLORE AN ENVIRONMENT USING OBSERVATION LIKELIHOODS

CONTROLLING AN AGENT TO EXPLORE AN ENVIRONMENT USING OBSERVATION LIKELIHOODS

机译:使用观察类比控制代理商来探索环境

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes, while training a neural network used to control the agent, generating a reward value for the training as a measure of the divergence between the likelihood of the further observation under first and second statistical models of the environment, the first statistical model and second model being based on respective first and second histories of past observations and actions, the most recent observation in the first history being more recent than the most recent observation in the second history.
机译:用于控制代理的方法,系统和装置,包括在计算机存储介质上编码的计算机程序。这些方法之一包括,在训练用于控制代理的神经网络的同时,生成用于训练的奖励值,以作为在环境的第一和第二统计模型下进一步观察的可能性之间的差异的度量,即第一统计模型和第二模型分别基于过去的观察和动作的第一和第二历史,第一历史中的最新观察比第二历史中的最新观察更近。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号