首页> 外国专利> STACKED CONVOLUTIONAL LONG SHORT-TERM MEMORY FOR MODEL-FREE REINFORCEMENT LEARNING

STACKED CONVOLUTIONAL LONG SHORT-TERM MEMORY FOR MODEL-FREE REINFORCEMENT LEARNING

机译：无模型强化学习的堆栈式卷积长期短期记忆

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent interacting with an environment. One of the methods includes obtaining a representation of an observation; processing the representation using a convolutional long short-term memory (LSTM) neural network comprising a plurality of convolutional LSTM neural network layers; processing an action selection input comprising the final LSTM hidden state output for the time step using an action selection neural network that is configured to receive the action selection input and to process the action selection input to generate an action selection output that defines an action to be performed by the agent at the time step; selecting, from the action selection output, the action to be performed by the agent at the time step in accordance with an action selection policy; and causing the agent to perform the selected action.

机译：方法，系统和装置，包括编码在计算机存储介质上的计算机程序，用于控制与环境交互的代理。其中一种方法包括获取观测值的表示;使用包括多个卷积LSTM神经网络层的卷积长短期记忆（LSTM）神经网络处理表示;使用动作选择神经网络处理该时间步的包括最终LSTM隐藏状态输出的动作选择输入，该动作选择神经网络配置为接收动作选择输入并处理该动作选择输入以生成将动作定义为以下内容的动作选择输出：由代理在时间步骤执行;从动作选择输出中，根据动作选择策略，选择代理在该时间步骤要执行的动作;并使代理执行选定的操作。

著录项

公开/公告号WO2020065024A1

专利类型
公开/公告日2020-04-02

原文格式PDF
申请/专利权人 DEEPMIND TECHNOLOGIES LIMITED;
展开▼

申请/专利号WO2019EP76213
发明设计人 MIRZA MOHAMMADI MEHDI;GUEZ ARTHUR CLEMENT;GREGOR KAROL;KABRA RISHABH;
展开▼

申请日2019-09-27
分类号G06N3;G06N3/04;G06N3/08;G06N7;
国家 WO
入库时间 2022-08-21 11:12:21

相似文献

专利
外文文献
中文文献