首页> 美国卫生研究院文献>PLoS Computational Biology >Predictive representations can link model-based reinforcement learning to model-free mechanisms
【2h】

Predictive representations can link model-based reinforcement learning to model-free mechanisms

机译:预测表示可以将基于模型的强化学习与无模型机制联系起来

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.
机译:通过基于模型的强化学习(RL)算法描述的过程,人类和动物能够通过考虑其长期的未来回报来评估行为。神经电路执行基于模型的RL所规定的计算的机制仍然未知。但是,有许多证据表明,支持基于模型的行为的神经回路在结构上与那些认为进行无模型时差(TD)学习的神经回路是同源的并且重叠。在这里,我们提出了一系列方法,可以在TD学习的核心上构建基于模型的计算。此框架的基础是后继表示形式,即预测状态表示形式,当与价值预测的TD学习结合使用时,可以产生与基于模型的学习相关联的行为的子集,同时比动态编程需要更少的决策时间计算。通过模拟,我们描述了通过使用这种方法评估动作而实现的精确行为能力,并将其与生物有机体所证明的行为能力进行了比较。然后,我们介绍两种基于继承表示的新算法,同时逐步减轻其局限性。因为此框架可以解释观察到的所有假定的基于模型的行为,同时仍使用核心TD框架,所以我们建议它代表了一种基于模型的评估的神经机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号