首页> 外文期刊>learning_Memory >Model-based reinforcement learning under concurrent schedules of reinforcement in rodents
【24h】

Model-based reinforcement learning under concurrent schedules of reinforcement in rodents

机译:啮齿类动物并发调度下基于模型的加强学习

获取原文
获取原文并翻译 | 示例
       

摘要

Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's knowledge or model of the environment in model-based reinforcement learning algorithms. To investigate how animals update value functions, we trained rats under two different free-choice tasks. The reward probability of the unchosen target remained unchanged in one task, whereas it increased over time since the target was last chosen in the other task. The results show that goal choice probability increased as a function of the number of consecutive alternative choices in the latter, but not the former task, indicating that the animals were aware of time-dependent increases in arming probability and used this information in choosing goals. In addition, the choice behavior in the latter task was better accounted for by a model-based reinforcement learning algorithm. Our results show that rats adopt a decision-making process that cannot be accounted for by simple reinforcement learning models even in a relatively simple binary choice task, suggesting that rats can readily improve their decision-making strategy through the knowledge of their environments.
机译:强化学习理论假设基于价值函数选择行动以最大化长期积极成果的价值,价值函数是对未来奖励的主观估计。在简单的强化学习算法中,仅通过反复试验来更新值函数,而在基于模型的强化学习算法中,它们根据决策者的知识或环境模型进行更新。为了研究动物如何更新价值功能,我们在两种不同的自由选择任务下训练了大鼠。未选择目标的奖励概率在一项任务中保持不变,但自从在另一项任务中最后选择目标以来,奖励概率随时间增加。结果表明,目标选择概率随后者(而非前一项任务)中连续替代选择数量的增加而增加,这表明动物意识到武装概率随时间的增长,并在选择目标时使用了该信息。此外,通过基于模型的强化学习算法可以更好地解释后一项任务中的选择行为。我们的研究结果表明,即使采用相对简单的二元选择任务,大鼠也无法采用简单的强化学习模型来解释决策过程,这表明大鼠可以通过了解其环境而轻松地改善其决策策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号