One of the interesting and difficult problems in recent reinforcement learning (RL) is to solve large-scale state space problem. The basic principle on forgetting in memory psychology has been combined with value-based reinforcement learning, thus generating a class of forgetting algorithms suitable to overcoming the RL problems. In this paper, the basic concepts for solving Markov decision problems are briefly introduced, the differences between off-policy and on-policy algorithms are compared, and the standard SARSA(λ) method is also outlined. After some characteristics of human memory and forgetting are analyzed, a forgetting rule for the RL agent is proposed, and then the SARSA(λ) algorithm is improved so as to form a Forget-SARSA(λ) with forgetting function. Finally, the experimental results are presented.%大状态空间值函数的激励学习是当今国际激励学习领域的一个热点和难点问题.将记忆心理学中有关遗忘的基本原理引入值函数的激励学习,形成了一类适合于值函数激励学习的遗忘算法.首先简要介绍了解决马尔可夫决策问题的基本概念,比较了离策略和在策略激励学习算法的差别,概述了标准的SARSA(λ)算法.在分析了人类记忆和遗忘的一些特征后,提出了一个智能体遗忘准则,进而将SARSA(λ)算法改进为具有遗忘功能的Forget-SARSA(λ)算法,最后给出了实验结果.
展开▼