首页>
外国专利>
REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES
REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES
展开▼
机译:组合动作空间中的强化学习
展开▼
页面导航
摘要
著录项
相似文献
摘要
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning in combinatorial action spaces. One of the methods includes receiving an observation characterizing a current state of an environment; for each of a plurality of candidate actions: processing a network input using a Q neural network to generate a Q value that represents a return received if the candidate action is selected while the candidate action is presented in response to the received observation, processing the network input using a myopic neural network to generate a myopic output that represents a likelihood that the candidate action will be selected if the candidate action is presented in response to the received observation, and combining the myopic output and the Q value for the candidate action to generate a selection score for the candidate action; and selecting the candidate actions having the highest selection scores.
展开▼