...
首页> 外文期刊>Journal of the Physical Society of Japan >Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation
【24h】

Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

机译:具有节点扰动的延迟奖励学习的统计力学

获取原文
获取原文并翻译 | 示例

摘要

In reward-based learning, reward is typically given with some delay after a behavior that causes the reward. In machine learning literature, the framework of the eligibility trace has been used as one of the solutions to handle the delayed reward in reinforcement learning. In recent studies, the eligibility trace is implied to be important for difficult neuroscience problem known as the “distal reward problem”. Node perturbation is one of the stochastic gradient methods from among many kinds of reinforcement learning implementations, and it searches the approximate gradient by introducing perturbation to a network. Since the stochastic gradient method does not require a objective function differential, it is expected to be able to account for the learning mechanism of a complex system, like a brain. We study the node perturbation with the eligibility trace as a specific example of delayed reward-based learning, and analyzed it using a statistical mechanics approach. As a result, we show the optimal time constant of the eligibility trace respect to the reward delay and the existence of unlearnable parameter configurations.
机译:在基于奖励的学习中,奖励通常会在导致奖励的行为发生后有所延迟。在机器学习文献中,资格跟踪的框架已用作解决强化学习中延迟奖励的解决方案之一。在最近的研究中,资格跟踪对于称为“远程奖励问题”的困难神经科学问题很重要。节点扰动是许多强化学习实现中的随机梯度方法之一,它通过将扰动引入网络来搜索近似梯度。由于随机梯度法不需要目标函数微分,因此期望能够考虑复杂系统(如大脑)的学习机制。我们以资格跟踪作为延迟奖励学习的具体示例,研究了节点扰动,并使用统计力学方法对其进行了分析。结果,我们显示了关于奖励延迟和不可学习的参数配置的存在的资格跟踪的最佳时间常数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号