首页> 外文会议>Symposium on Artificial Immune Systems and Immune System Modelling >Pavlovian and Instrumental Q-learning: A Rescorla-Wagner-based approach to generalization in Q-learning

【24h】

Pavlovian and Instrumental Q-learning: A Rescorla-Wagner-based approach to generalization in Q-learning

机译：Pavlovian和乐器Q-Learning：基于Rescorla-Wagner的普遍化方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Traditionally, the Reinforcement Learning (RL) problem is presented as follows: An agent exists in an environment described by some set of possible states S, where it can perform any set of actions A. Each time it performs an action a{sub}t ∈ A in some state s{sub}t ∈ S the agent received a real-valued reward r{sub}t that indicates the immediate value of this state-action transition, This produces a sequence of states, actions, and immediate rewards. The agent's task is to learn a control policy, π:S→A, that maximizes the expected sum of rewards, typically with future rewards discounted exponentially by their delay. Unlike supervised learning, the learner is not told which actions to take, but instead must discover which actions yield the most reward by exploiting and exploring their relationship with the environment. Besides, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics, trial and error search and delayed reward, are the two major features of RL.

机译：传统上，增强学习（RL）问题如下所示：在某些可能的状态S描述的环境中存在代理，在那里它可以执行任何一组动作A.每次执行动作A {sub} t ∈A在某些状态S {sub} t萱✍代理收到了一个真实值奖励r {sub} t，它表示此状态操作转换的立即值，这会产生一系列状态，操作和立即奖励。代理人的任务是学习控制策略，π：S→A，最大化奖励的预期总和，通常随着他们的延迟指数折扣的未来奖励。与监督学习不同，学习者没有被告知采取的行动，而是必须通过利用和探索与环境的关系产生最大奖励的行动。此外，行动不仅可能影响即时奖励，也可能影响下一个情况，并通过这一点，所有后续奖励。这两个特征，试验和错误搜索和延迟奖励是RL的两个主要特征。

著录项

来源
《Symposium on Artificial Immune Systems and Immune System Modelling 》|2007年||共7页
会议地点
作者
Eduardo Alonso; Esther Mondragon; Niclas Kjall-Ohlsson;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents [J] . Yasuyo Hatcho, Kiyohiko Hattori, Keiki Takadama Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2009 ,第6a72期

机译：强化学习中的时间范围泛化：Q学习代理中的多个Q表泛化
2. A Generalization Error for Q-Learning [J] . Murphy Susan A. Journal of machine learning research . 2005 ,第Jul期

机译：Q学习的一般化错误
3. ANXIETY SENSITIVITY MODERATES THE EXTENT TO WHICH GENERALIZATION OF PAVLOVIAN FEAR LEADS TO MALADAPTIVE INSTRUMENTAL AVOIDANCE [J] . Hunt Christopher, Cooper Samuel E., Hartnell Melissa P., Psychophysiology . 2016 ,第S1期

机译：焦虑敏感性决定了帕夫洛夫恐惧症的泛化程度会导致不良仪器避免
4. Pavlovian and Instrumental Q-learning: A Rescorla-Wagner-based approach to generalization in Q-learning [C] . Eduardo Alonso, Esther Mondragon, Niclas Kjall-Ohlsson Symposium on Artificial Immune Systems and Immune System Modelling . 2007

机译：Pavlovian和乐器Q-Learning：基于Rescorla-Wagner的普遍化方法
5. On Deep Reinforcement Learning for Games: Generalization of Deep Q-Learning with Multiple Policy Heads [D] . Boucher, Mathieu. 2020

机译：关于游戏的深度加固学习：多重政策头部深度Q学的泛化
6. Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning [O] . Shota Ohnishi, Eiji Uchibe, Yotaro Yamaguchi, 2019

机译：受约束的深度Q学习逐渐接近普通Q学习
7. Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning [O] . Shota Ohnishi, Eiji Uchibe, Yotaro Yamaguchi, 2019

机译：约束深度Q学习逐渐接近普通Q-Learning

Pavlovian and Instrumental Q-learning: A Rescorla-Wagner-based approach to generalization in Q-learning

摘要

著录项

相似文献

相关主题

期刊订阅