首页> 外文会议>IEEE Symposium Series on Computational Intelligence >Q-learning with experience replay in a dynamic environment
【24h】

Q-learning with experience replay in a dynamic environment

机译:在动态环境中回放具有经验的Q学习

获取原文

摘要

Most research in reinforcement learning has focused on stationary environments. In this paper, we propose several adaptations of Q-learning for a dynamic environment, for both single and multiple agents. The environment consists of a grid of random rewards, where every reward is removed after a visit. We focus on experience replay, a technique that receives a lot of attention nowadays, and combine this method with Q-learning. We compare two variations of experience replay, where experiences are reused based on time or based on the obtained reward. For multi-agent reinforcement learning we compare two variations of policy representation. In the first variation the agents share a Q-function, while in the second variation both agents have a separate Q-function. Furthermore, in both variations we test the effect of reward sharing between the agents. This leads to four different multi-agent reinforcement learning algorithms, from which sharing a Q-function and sharing the rewards is the most cooperative method. The results show that in the single-agent environment both experience replay algorithms significantly outperform standard Q-learning and a greedy benchmark agent. In the multi-agent environment the highest maximum reward sum in a trial is achieved by using one Q-function and reward sharing. The highest mean reward sum is obtained with separate Q-functions and separate rewards.
机译:强化学习的大多数研究都集中在固定环境上。在本文中,我们针对单个和多个代理为动态环境提出了Q学习的几种改编方案。该环境由随机奖励的网格组成,访问后在其中删除所有奖励。我们专注于体验重播,这种技术如今已引起广泛关注,并将此方法与Q学习相结合。我们比较了经验重放的两种变体,其中经验是基于时间或基于获得的奖励而重复使用的。对于多主体强化学习,我们比较了两种形式的策略表示形式。在第一个变体中,代理共享Q功能,而在第二个变体中,两种代理都具有单独的Q功能。此外,在这两种变体中,我们测试了代理商之间的报酬分享效果。这导致了四种不同的多主体强化学习算法,从中共享Q函数和共享奖励是最协作的方法。结果表明,在单代理环境中,两种体验重播算法均明显优于标准Q学习和贪婪的基准代理。在多主体环境中,通过使用一个Q函数和奖励共享,可以在试验中获得最高的最大奖励金额。使用单独的Q函数和单独的奖励可获得最高的平均奖励总和。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号