Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains

Mizutani Eiji; Dreyfus Stuart

首页> 外文期刊>Annals of Operations Research >Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains

【24h】

Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains

机译：非马尔可夫域中的完全无模型的actor-critic递归神经网络强化学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

For solving a sequential decision-making problem in a non-Markovian domain, standard dynamic programming (DP) requires a complete mathematical model; hence, a totally model-based approach. By contrast, this paper describes a totally model-free approach by actor-critic reinforcement learning with recurrent neural networks. The recurrent connections (or context units) in neural networks act as an implicit form of internal state (i.e., history memory) for developing sensitivity to hidden non-Markovian dependencies, rendering the process Markovian implicitly and automatically in a totally model-free fashion. That is, the model-free recurrent-network agent neither learns transitional probabilities and associated rewards, nor by how much the state space should be enlarged so that the Markov property holds. For concreteness, we illustrate time-lagged path problems, in which our learning agent is expected to learn a best (history-dependent) policy that maximizes the total return, the sum of one-step transitional rewards plus special "bonus" values dependent on prior transitions or decisions. Since we can obtain an optimal solution by model-based DP, this is an excellent test on the learning agent for understanding its model-free learning behavior. Such actor-critic recurrent-network learning might constitute a mechanism which animal brains use when experientially acquiring skilled action. Given a concrete non-Markovian problem example, the goal of this paper is to show the conceptual merit of totally model-free learning with actor-critic recurrent networks, compared with classical DP (and other model-building procedures), rather than pursue a best recurrent-network learning strategy.

机译：为了解决非马尔可夫域中的顺序决策问题，标准动态规划（DP）需要完整的数学模型。因此，完全基于模型的方法。相比之下，本文描述了使用行为者批判强化学习和递归神经网络的完全无模型方法。神经网络中的循环连接（或上下文单元）充当内部状态（即历史记忆）的隐式形式，以发展对隐藏的非马尔可夫依赖项的敏感性，以完全无模型的方式隐式自动地呈现过程马尔可夫。也就是说，无模型的递归网络代理既不学习过渡概率和相关的奖励，也不学习状态空间应扩大多少以使马尔可夫性质成立。具体来说，我们举例说明了时间滞后的路径问题，其中我们的学习代理期望学习一种最佳（历史相关）的策略，该策略最大程度地提高总回报，一步过渡奖励的总和以及取决于以下条件的特殊“奖励”值事先的过渡或决定。由于我们可以通过基于模型的DP获得最佳解决方案，因此这是对学习代理了解其无模型学习行为的绝佳测试。这种行为者批判性的递归网络学习可能构成动物大脑在经验上获得熟练动作时所使用的一种机制。给定一个具体的非马尔可夫问题示例，本文的目的是展示与经典DP（和其他模型构建过程）相比，使用行为者评论性递归网络进行完全无模型学习的概念优点，最佳的递归网络学习策略。

著录项

来源
《Annals of Operations Research》 |2017年第1期|107-131|共25页
作者
Mizutani Eiji; Dreyfus Stuart;
展开▼
作者单位

Natl Taiwan Univ Sci & Technol, 43 Keelung Rd, Taipei 106, Taiwan;

Univ Calif Berkeley, Berkeley, CA 94720 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Actor-critic reinforcement learning; Recurrent neural networks; Non-Markovian dependencies;

机译：演员批评强化学习;递归神经网络;非马氏依赖;
入库时间 2022-08-18 03:04:20

相似文献

外文文献
中文文献
专利

1. LEARNING TO CONTROL THE THREE-LINK MUSCULOSKELETAL ARM USING ACTOR-CRITIC REINFORCEMENT LEARNING ALGORITHM DURING REACHING MOVEMENT [J] . Ehsan Tahami, Amir Homayoun Jafari, Ali Fallah Biomedical Engineering: Applications, Basis and Communications . 2014,第5期

机译：在运动过程中使用基于行为准则的强化学习算法来控制三链肌骨骼肌的学习
2. The "Proactive" Model of Learning: Integrative Framework for Model-Free and Model-Based Reinforcement Learning Utilizing the Associative Learning-Based Proactive Brain Concept [J] . Zsuga Judit, Biro Klara, Papp Csaba, Behavioral neuroscience . 2016,第1期

机译：“主动”学习模型：利用基于联合学习的主动脑概念进行无模型和基于模型的强化学习的集成框架
3. Learning Agent for a Heat-Pump Thermostat with a Set-Back Strategy Using Model-Free Reinforcement Learning [J] . Bert J. Claessens, Frederik Ruelens, Neville R. Watson, Energies . 2015,第8期

机译：采用无模型强化学习的热缩调温器学习软件
4. Totally model-free reinforcement learning by actor-critic Elman networks in non-Markovian domains [C] . Mizutani, E., Dreyfus, . 1998

机译：非Markovian域中的行为者评论Elman网络完全无需模型的强化学习
5. Mars: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler [D] . Baheri, Betis. 2020

机译：火星：多可扩展的演员 - 评论家强化学习调度员
6. Believer-Skeptic Meets Actor-Critic: Rethinking the Role of Basal Ganglia Pathways during Decision-Making and Reinforcement Learning [O] . Kyle Dunovan, Timothy Verstynen 2016

机译：怀疑论者遇到演员批评者：重新思考基础神经节通路在决策和强化学习中的作用
7. Totally Model-Free Reinforcement Learning by Actor-Critic Elman Networks in Non-Markovian Domains [O] . Eiji Mizutani, Stuart E Dreyfus 1998

机译：非markovian领域的演员 - 评论家Elman网络完全无模型强化学习

Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains

摘要

著录项

相似文献

相关主题

期刊订阅