首页> 外文会议>Symposium on Artificial Immune Systems and Immune System Modelling >Pavlovian and Instrumental Q-learning: A Rescorla-Wagner-based approach to generalization in Q-learning
【24h】

Pavlovian and Instrumental Q-learning: A Rescorla-Wagner-based approach to generalization in Q-learning

机译:Pavlovian和乐器Q-Learning:基于Rescorla-Wagner的普遍化方法

获取原文

摘要

Traditionally, the Reinforcement Learning (RL) problem is presented as follows: An agent exists in an environment described by some set of possible states S, where it can perform any set of actions A. Each time it performs an action a{sub}t ∈ A in some state s{sub}t ∈ S the agent received a real-valued reward r{sub}t that indicates the immediate value of this state-action transition, This produces a sequence of states, actions, and immediate rewards. The agent's task is to learn a control policy, π:S→A, that maximizes the expected sum of rewards, typically with future rewards discounted exponentially by their delay. Unlike supervised learning, the learner is not told which actions to take, but instead must discover which actions yield the most reward by exploiting and exploring their relationship with the environment. Besides, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics, trial and error search and delayed reward, are the two major features of RL.
机译:传统上,增强学习(RL)问题如下所示:在某些可能的状态S描述的环境中存在代理,在那里它可以执行任何一组动作A.每次执行动作A {sub} t ∈A在某些状态S {sub} t萱✍代理收到了一个真实值奖励r {sub} t,它表示此状态操作转换的立即值,这会产生一系列状态,操作和立即奖励。代理人的任务是学习控制策略,π:S→A,最大化奖励的预期总和,通常随着他们的延迟指数折扣的未来奖励。与监督学习不同,学习者没有被告知采取的行动,而是必须通过利用和探索与环境的关系产生最大奖励的行动。此外,行动不仅可能影响即时奖励,也可能影响下一个情况,并通过这一点,所有后续奖励。这两个特征,试验和错误搜索和延迟奖励是RL的两个主要特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号