首页> 外国专利> REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES

REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES

机译：组合动作空间中的强化学习

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning in combinatorial action spaces. One of the methods includes receiving an observation characterizing a current state of an environment; for each of a plurality of candidate actions: processing a network input using a Q neural network to generate a Q value that represents a return received if the candidate action is selected while the candidate action is presented in response to the received observation, processing the network input using a myopic neural network to generate a myopic output that represents a likelihood that the candidate action will be selected if the candidate action is presented in response to the received observation, and combining the myopic output and the Q value for the candidate action to generate a selection score for the candidate action; and selecting the candidate actions having the highest selection scores.

机译：用于在组合动作空间中进行强化学习的方法，系统和装置，包括在计算机存储介质上编码的计算机程序。该方法之一包括接收表征环境的当前状态的观察;以及对于多个候选动作中的每个动作：使用Q神经网络处理网络输入以生成Q值，该值表示如果选择了候选动作而响应接收到的观察结果显示了候选动作，则表示接收到的返回值，处理网络使用近视神经网络进行输入以生成近视输出，该输出表示如果响应于接收到的观察而提出了候选动作，则将选择候选动作的可能性，并将近视输出和候选动作的Q值组合在一起以生成候选动作的选择分数;以及选择得分最高的候选动作。

著录项

公开/公告号WO2019222746A1

专利类型
公开/公告日2019-11-21

原文格式PDF
申请/专利权人 GOOGLE LLC;
展开▼

申请/专利号WO2019US33141
发明设计人 IE TZE WAY EUGENE;JAIN VIHAN;WANG JING;AGARWAL RITESH;BOUTILIER CRAIG EDGAR;
展开▼

申请日2019-05-20
分类号G06N3;G06N3/04;G06N3/08;
国家 WO
入库时间 2022-08-21 11:14:42

相似文献

专利
外文文献
中文文献