Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

Saito Hiroshi; Katahira Kentaro; Okanoya Kazuo; Okada Masato

首页> 外文期刊>Journal of the Physical Society of Japan >Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

【24h】

Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

机译：具有节点扰动的延迟奖励学习的统计力学

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In reward-based learning, reward is typically given with some delay after a behavior that causes the reward. In machine learning literature, the framework of the eligibility trace has been used as one of the solutions to handle the delayed reward in reinforcement learning. In recent studies, the eligibility trace is implied to be important for difficult neuroscience problem known as the “distal reward problem”. Node perturbation is one of the stochastic gradient methods from among many kinds of reinforcement learning implementations, and it searches the approximate gradient by introducing perturbation to a network. Since the stochastic gradient method does not require a objective function differential, it is expected to be able to account for the learning mechanism of a complex system, like a brain. We study the node perturbation with the eligibility trace as a specific example of delayed reward-based learning, and analyzed it using a statistical mechanics approach. As a result, we show the optimal time constant of the eligibility trace respect to the reward delay and the existence of unlearnable parameter configurations.

机译：在基于奖励的学习中，奖励通常会在导致奖励的行为发生后有所延迟。在机器学习文献中，资格跟踪的框架已用作解决强化学习中延迟奖励的解决方案之一。在最近的研究中，资格跟踪对于称为“远程奖励问题”的困难神经科学问题很重要。节点扰动是许多强化学习实现中的随机梯度方法之一，它通过将扰动引入网络来搜索近似梯度。由于随机梯度法不需要目标函数微分，因此期望能够考虑复杂系统（如大脑）的学习机制。我们以资格跟踪作为延迟奖励学习的具体示例，研究了节点扰动，并使用统计力学方法对其进行了分析。结果，我们显示了关于奖励延迟和不可学习的参数配置的存在的资格跟踪的最佳时间常数。

著录项

来源
《Journal of the Physical Society of Japan 》 |2010年第6期| 共6页
作者
Saito Hiroshi; Katahira Kentaro; Okanoya Kazuo; Okada Masato;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类物理学 ;
关键词
statistical mechanics; delayed reward; eligibility trace; node perturbation; reward-based learning;

机译：统计力学;延迟奖励;资格跟踪;节点扰动;奖励学习;

相似文献

外文文献
中文文献
专利

1. Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation [J] . Saito Hiroshi, Katahira Kentaro, Okanoya Kazuo, Journal of the Physical Society of Japan . 2010 ,第6期

机译：具有节点扰动的延迟奖励学习的统计力学
2. Statistical Mechanics of Node-perturbation Learning with Noisy Baseline [J] . Hara Kazuyuki, Katahira Kentaro, Okada Masato Journal of the Physical Society of Japan . 2017 ,第2期

机译：噪声基线节点扰动学习的统计力学
3. Statistical mechanics of node-perturbation learning for nonlinear perceptron [J] . Hara K., Katahira K., Okanoya K., Journal of the Physical Society of Japan . 2013 ,第5期

机译：非线性感知器节点摄动学习的统计力学
4. DECENTRALIZED ITERATIVE LEARNING CONTROL SCHEMES FOR LARGE SCALE SYSTEMS INCLUDING DELAYED STATE PERTURBATIONS IN THE INTERCONNECTIONS [C] . Hansheng Wu IFAC Symposium on Automatic Control in Aerospace . 2005

机译：用于大型系统的分散迭代学习控制方案，包括互连中的延迟状态扰动
5. A STATISTICAL ANALYSIS OF SUBSIDENCE TYPE AND DELAYS OVER ROOM AND PILLAR COAL MINES (ROCK MECHANICS, STRATA CONTROL, GROUND). [D] . VAN BESIEN, ALPHONSE C. 1985

机译：房体和柱煤开采的沉降类型和延误的统计分析（岩石力学，地层控制，地面）。
6. Modulation of auditory-motor learning in response to formant perturbation as a function of delayed auditory feedback [O] . Takashi Mitsuya, Kevin G. Munhall, David W. Purcell -1

机译：响应共振峰扰动而调节听觉运动学习作为延迟听觉反馈的函数
7. Statistical Mechanics of Node-perturbation Learning with Noisy Baseline [O] . Hara, Kazuyuki, Katahira, Kentaro, Okada, Masato 2017

机译：基于噪声基线的节点扰动学习统计力学

Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

摘要

著录项

相似文献

相关主题

期刊订阅