首页>
外国专利>
ACCELERATED DEEP REINFORCEMENT LEARNING OF AGENT CONTROL POLICIES
ACCELERATED DEEP REINFORCEMENT LEARNING OF AGENT CONTROL POLICIES
展开▼
机译:加速了代理控制政策的深度增强学习
展开▼
页面导航
摘要
著录项
相似文献
摘要
Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for training a mixture of a plurality of actor-critic policies that is used to control an agent interacting with an environment to perform a task. Each actor-critic policy includes an actor policy and a critic policy. The training includes, for each of one or more transitions, determining a target Q value for the transition from (i) the reward in the transition, and (ii) an imagined return estimate generated by performing one or more iterations of a prediction process to generate one or more predicted future transitions.
展开▼