首页> 外文会议>IEEE Symposium Series on Computational Intelligence >Q-learning with experience replay in a dynamic environment

【24h】

Q-learning with experience replay in a dynamic environment

机译：在动态环境中回放具有经验的Q学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most research in reinforcement learning has focused on stationary environments. In this paper, we propose several adaptations of Q-learning for a dynamic environment, for both single and multiple agents. The environment consists of a grid of random rewards, where every reward is removed after a visit. We focus on experience replay, a technique that receives a lot of attention nowadays, and combine this method with Q-learning. We compare two variations of experience replay, where experiences are reused based on time or based on the obtained reward. For multi-agent reinforcement learning we compare two variations of policy representation. In the first variation the agents share a Q-function, while in the second variation both agents have a separate Q-function. Furthermore, in both variations we test the effect of reward sharing between the agents. This leads to four different multi-agent reinforcement learning algorithms, from which sharing a Q-function and sharing the rewards is the most cooperative method. The results show that in the single-agent environment both experience replay algorithms significantly outperform standard Q-learning and a greedy benchmark agent. In the multi-agent environment the highest maximum reward sum in a trial is achieved by using one Q-function and reward sharing. The highest mean reward sum is obtained with separate Q-functions and separate rewards.

机译：强化学习的大多数研究都集中在固定环境上。在本文中，我们针对单个和多个代理为动态环境提出了Q学习的几种改编方案。该环境由随机奖励的网格组成，访问后在其中删除所有奖励。我们专注于体验重播，这种技术如今已引起广泛关注，并将此方法与Q学习相结合。我们比较了经验重放的两种变体，其中经验是基于时间或基于获得的奖励而重复使用的。对于多主体强化学习，我们比较了两种形式的策略表示形式。在第一个变体中，代理共享Q功能，而在第二个变体中，两种代理都具有单独的Q功能。此外，在这两种变体中，我们测试了代理商之间的报酬分享效果。这导致了四种不同的多主体强化学习算法，从中共享Q函数和共享奖励是最协作的方法。结果表明，在单代理环境中，两种体验重播算法均明显优于标准Q学习和贪婪的基准代理。在多主体环境中，通过使用一个Q函数和奖励共享，可以在试验中获得最高的最大奖励金额。使用单独的Q函数和单独的奖励可获得最高的平均奖励总和。

著录项

来源
《IEEE Symposium Series on Computational Intelligence 》|2016年|1-8|共8页
会议地点
作者
Mathijs Pieters; Marco A. Wiering;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Learning (artificial intelligence); Heuristic algorithms; Standards; Markov processes; Dynamics; Benchmark testing; Prediction algorithms;

机译：学习（人工智能）;启发式算法;标准;马尔可夫过程;动力学;基准测试;预测算法;

相似文献

外文文献
中文文献
专利

1. Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge [J] . Lan Jiang, Hongyun Huang, Zuohua Ding 自动化学报：英文版 . 2020 ,第004期

机译：基于深度Q学习，经验回放和启发式知识的智能机器人路径规划
2. Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge [J] . Lan Jiang, Hongyun Huang, Zuohua Ding 自动化学报（英文版） . 2020 ,第004期

机译：基于Deep Q-Learning的经验重播和启发式知识的智能机器人路径规划
3. Experience replay-based output feedback Q-learning scheme for optimal output tracking control of discrete-time linear systems [J] . Rizvi Syed Ali Asad, Lin Zongli International Journal of Adaptive Control and Signal Processing . 2019 ,第12期

机译：基于重放的输出反馈Q学习方案，用于离散线性系统的最佳输出跟踪控制
4. Q-learning with experience replay in a dynamic environment [C] . Mathijs Pieters, Marco A. Wiering IEEE Symposium Series on Computational Intelligence . 2016

机译：Q-Learning在动态环境中有经验重播
5. Checkpoint Hindsight Experience Replay, Intuitive Application of Domain Knowledge in Reward-sparse Environments [D] . Wyss, Eric K. 2020

机译：CheckPoint Hindsight体验重播，直观地在奖励稀疏环境中应用域知识
6. Double Deep Q-Learning and Faster R-CNN-Based Autonomous Vehicle Navigation and Obstacle Avoidance in Dynamic Environment [O] . Razin Bin Issa, Modhumonty Das, Md. Saferi Rahman, 2021

机译：双层Q-Learning和更快的R-CNN自主车辆导航和动态环境中的避难
7. Heuristic Q-learning based on experience replay for three-dimensional path planning of the unmanned aerial vehicle [O] . Ronglei Xie, Zhijun Meng, Yaoming Zhou, 2019

机译：启发式Q-Leature基于经验重播，无人驾驶空中车辆的三维路径规划
8. Replay Technique: The Concept, Initial Experience and Experience and Proposed Developments [R] . Jackson, A., Onslow, G. J. 1985

机译：重播技术：概念，初步经验和经验以及拟议的发展

Q-learning with experience replay in a dynamic environment

摘要

著录项

相似文献

相关主题

期刊订阅