首页> 外国专利> TRAINING ACTION SELECTION NEURAL NETWORKS USING HINDSIGHT MODELLING

TRAINING ACTION SELECTION NEURAL NETWORKS USING HINDSIGHT MODELLING

机译：使用后古建模培训行动选择神经网络

页面导航

摘要
著录项
相似文献

摘要

A reinforcement learning method and system that selects actions to be performed by a reinforcement learning agent interacting with an environment. A causal model is implemented by a hindsight model neural network and trained using hindsight i.e. using future environment state trajectories. As the method and system does not have access to this future information when selecting an action, the hindsight model neural network is used to train a model neural network which is conditioned on data from current observations, which learns to predict an output of the hindsight model neural network.

机译：一种加强学习方法和系统，其选择要通过与环境交互的加强学习代理执行的动作。因果模型由后古模型神经网络实施，并使用后古训练，即使用未来的环境状态轨迹。由于方法和系统在选择一个动作时没有访问此未来信息，所以使用后可以模型神经网络用于训练模型神经网络，该模型神经网络在来自当前观察的数据上调节，这将学习预测后可以模型的输出神经网络。

著录项

公开/公告号WO2021058588A1

专利类型
公开/公告日2021-04-01

原文格式PDF
申请/专利权人 DEEPMIND TECHNOLOGIES LIMITED;
展开▼

申请/专利号WO2020EP76604
发明设计人 GUEZ ARTHUR CLEMENT;VIOLA FABIO;WEBER THEOPHANE GUILLAUME;BUESING LARS;HEESS NICOLAS MANFRED OTTO;
展开▼

申请日2020-09-23
分类号G06N3/02;G06K9/62;
国家 EP
入库时间 2022-08-24 18:03:49

相似文献

专利
外文文献
中文文献