首页> 外文期刊>Neural computation >Risk-Sensitive Reinforcement Learning
【24h】

Risk-Sensitive Reinforcement Learning

机译:风险敏感强化学习

获取原文
获取原文并翻译 | 示例

摘要

We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents’ behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.
机译:我们为代理商提供了一系列风险敏感的强化学习方法,这些代理商在不确定的环境中面临顺序决策任务。通过将效用函数应用于时间差(TD)误差,不仅可以将非线性变换有效地应用于接收到的奖励,而且还可以有效地应用于基础马尔可夫决策过程的真实转移概率。选择适当的效用函数后,行为者的行为会表现出预期理论所预测的人类行为的关键特征(Kahneman&Tversky, 1979 ),例如,对收益和损失的不同风险偏好,以及主观概率曲线的形状。我们推导了一种风险敏感的Q学习算法,该算法对于在未知过渡概率的情况下对人类行为进行建模非常必要,并证明其收敛性。作为新框架适用性的原则证明,我们将其用于量化顺序投资任务中的人类行为。我们发现,风险敏感型变量可以更好地拟合行为数据,并导致对受试者反应的解释与前景理论确实一致。同时测量的fMRI信号的分析显示,腹侧纹状体中风险敏感的TD误差与BOLD信号变化之间存在显着相关性。此外,我们发现,如果使用标准Q值,则风险敏感的Q值与纹状体,扣带回皮层和绝缘组织中神经活动的显着相关性。

著录项

  • 来源
    《Neural computation》 |2014年第7期|1298-1328|共31页
  • 作者单位

    Technical University, 10587 Berlin, Germany yun@ni.tu-berlin.de;

  • 收录信息 美国《科学引文索引》(SCI);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号