首页> 外国专利> Distributed training using policy off actor explicit reinforcement learning

Distributed training using policy off actor explicit reinforcement learning

机译：分布式培训使用政策OFF演员显式强化学习

页面导航

摘要
著录项
相似文献

摘要

A method and system comprising a computer program encoded on a computer storage medium for training an action selective neural network that is used to select an action performed by interacting with an environment And apparatus.In one embodiment, the system includes a plurality of actor computing units and a plurality of Larner computing units.The actor computing unit uses the enhanced learning technique to generate the trajectory of the experience tuple used by the Larner computing unit to update the parameters of the Larner action selective neural network.Reinforcement learning technology may be a policy offensive reinforcement learning technique.

机译：一种方法和系统，包括在计算机存储介质上编码的计算机程序，用于训练用于训练通过与环境和装置进行交互来选择执行的动作的动作选择性神经网络。在一个实施例中，该系统包括多个actor计算单元和多个LARNER计算单元。演员计算单元使用增强的学习技术来生成LARNER计算单元使用的体验元组的轨迹来更新LARNER动作选择性神经网络的参数。重新实施学习技术可能是一种策略进攻强化学习技术。

著录项

公开/公告号JP2021513128A

专利类型
公开/公告日2021-05-20

原文格式PDF
申请/专利权人ディープマインドテクノロジーズリミテッド;
展开▼

申请/专利号JP20200529199
发明设计人フーベルト・ヨーゼフ・ソイヤー;ラッセ・エスペホルト;カレン・シモニアン;ヨタム・ドロン;ヴラッド・フィロイウ;ヴォロディミル・ムニヒ;コーレイ・カヴクチュオグル;レミ・ムノス;トーマス・ウォード;ティモシー・ジェームズ・アレクサンダー・ハーレー;イアン・ダニング;
展开▼

申请日2019-02-05
分类号G06N3/08;
国家 JP
入库时间 2022-08-24 18:49:27

相似文献

专利
外文文献
中文文献