首页> 外国专利> REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES

REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES

机译:组合动作空间中的强化学习

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning in combinatorial action spaces. One of the methods includes receiving an observation characterizing a current state of an environment; for each of a plurality of candidate actions: processing a network input using a Q neural network to generate a Q value that represents a return received if the candidate action is selected while the candidate action is presented in response to the received observation, processing the network input using a myopic neural network to generate a myopic output that represents a likelihood that the candidate action will be selected if the candidate action is presented in response to the received observation, and combining the myopic output and the Q value for the candidate action to generate a selection score for the candidate action; and selecting the candidate actions having the highest selection scores.
机译:用于在组合动作空间中进行强化学习的方法,系统和装置,包括在计算机存储介质上编码的计算机程序。该方法之一包括接收表征环境的当前状态的观察;以及对于多个候选动作中的每个动作:使用Q神经网络处理网络输入以生成Q值,该值表示如果选择了候选动作而响应接收到的观察结果显示了候选动作,则表示接收到的返回值,处理网络使用近视神经网络进行输入以生成近视输出,该输出表示如果响应于接收到的观察而提出了候选动作,则将选择候选动作的可能性,并将近视输出和候选动作的Q值组合在一起以生成候选动作的选择分数;以及选择得分最高的候选动作。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号