首页> 外国专利> TRAINING ACTION SELECTION NEURAL NETWORKS USING LOOK-AHEAD SEARCH

TRAINING ACTION SELECTION NEURAL NETWORKS USING LOOK-AHEAD SEARCH

机译:预先查找搜索的训练动作选择神经网络

摘要

Methods, systems and apparatus, including computer programs encoded on computer storage media, for training an action selection neural network. One of the methods includes receiving an observation characterizing a current state of the environment; determining a target network output for the observation by performing a look ahead search of possible future states of the environment starting from the current state until the environment reaches a possible future state that satisfies one or more termination criteria, wherein the look ahead search is guided by the neural network in accordance with current values of the network parameters; selecting an action to be performed by the agent in response to the observation using the target network output generated by performing the look ahead search; and storing, in an exploration history data store, the target network output in association with the observation for use in updating the current values of the network parameters.
机译:用于训练动作选择神经网络的方法,系统和装置,包括在计算机存储介质上编码的计算机程序。该方法之一包括:接收表征环境当前状态的观察;以及通过执行从当前状态开始直到环境达到满足一个或多个终止条件的可能的未来状态的环境的可能的未来状态的前瞻性搜索,确定用于观察的目标网络输出,其中前瞻性搜索由神经网络根据网络参数的当前值;使用通过执行前瞻搜索而生成的目标网络输出,选择代理响应于所述观察而执行的动作;将目标网络输出与观察结果相关联地存储在勘探历史数据存储中,以用于更新网络参数的当前值。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号