首页> 美国政府科技报告 >Extending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models

【24h】

Extending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models

机译：将分层强化学习扩展到连续时间，平均奖励和多智能体模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hierarchical reinforcement learning (HRL) is a general framework that studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work on HRL has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. In this paper we generalize the setting of HRL to average-reward, continuous-time and multi-agent SMDP models. We also describe experimental results from a large- scale real-world domain, attesting to the bene ts of HRL generally, and to our extensions more speci cally. Although in principle any HRL framework could su ce, we focus in this paper on the MAXQ framework. We describe three new hierarchical reinforcement learning algorithms: continuous-time discounted reward MAXQ, discrete-time average reward MAXQ, and continuous-time average reward MAXQ. We also investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multiagent tasks. We extend the MAXQ framework to the multiagent case which we term cooperative MAXQ, where each agent uses the same task hierarchy. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. We use two experimental testbeds to study the empirical performance of our proposed extensions. One domain is a simulated robot trash collection task. The other domain is a much larger real-world multi-agent autonomous guided vehicle (MAGV) problem. We compare the performance of our proposed algorithms with each other, as well as with the original MAXQ method and to standard Q-learning. In the MAGV domain, we show that our proposed extensions outperform widely used industrial heuristics, such as first come first serve", "highest queue first" and "nearest station first".

著录项

作者
Ghavamzadeh, M. ; Mahadevan, S. ; Makar, R.;
展开▼
作者单位

展开▼
年度 2003
页码 1-34
总页数 34
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Heuristic methods; Learning; Reinforcement(Structures); Algorithms; Test beds; Industries; Markov processes; Chemical agent detectors; Hierarchies; Policies; Time; Discrete distribution;

机译：启发式方法;学习;强化（结构）;算法;试验台;工业;马尔可夫过程;化学剂探测器;层次结构;政策;时间;离散分布;

相似文献

外文文献
中文文献
专利

1. Erratum to: A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers [J] . Bilal H. Abed-alguni, Stephan K. Chalup, Frans A. Henskens, Vietnam Journal of Computer Science . 2015,第4期

机译：勘误至：使用顾问，辅导员和工人的层次结构的多主体合作强化学习模型
2. A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers [J] . Bilal H. Abed-alguni, Stephan K. Chalup, Frans A. Henskens, Vietnam Journal of Computer Science . 2015,第4期

机译：使用顾问，导师和工人的层次结构的多主体协作强化学习模型
3. When Does Communication Learning Need Hierarchical Multi-Agent Deep Reinforcement Learning [J] . Marie Ossenkopf, Mackenzie Jorgensen, Kurt Geihs Cybernetics and Systems . 2019,第5a8期

机译：沟通学习何时需要分层多功能深度加强学习
4. Model-based Hierarchical Average-reward Reinforcement Learning [C] . Sandeep Seri, Prasad Tadepalli International conference on machine learning . 2002

机译：基于模型的等级普通奖励强化学习
5. Model-Based Reinforcement Learning for Cooperative Multi-Agent Planning: Exploiting Hierarchies, Bias, and Temporal Sampling [D] . Ma, Aaron. 2020

机译：基于模型的合作多智能经纪人规划的强化学习：利用层次结构，偏见和时间采样
6. Multi-agent reinforcement learning with approximate model learning for competitive games [O] . Young Joon Park, Yoon Sang Cho, Seoung Bum Kim 2012

机译：多主体强化学习和近似模型学习的竞技游戏
7. Extending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models [O] . Mohammad Ghavamzadeh, Sridhar Mahadevan, Rajbala Makar 2007

机译：将分层强化学习扩展到连续时间，平均奖励和多智能体模型

Extending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models

摘要

著录项

相似文献

相关主题

期刊订阅