首页> 美国卫生研究院文献>other >Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning
【2h】

Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning

机译:通过分布式轮循Q学习学习多机器人软管的运输和部署

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent’s local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.
机译:多主体强化学习(MARL)算法面临两个主要困难:维度的诅咒和由于代理同时执行的独立学习过程而导致的环境不稳定。在本文中,我们形式化并证明了用于协作系统的分布式Round Robin Q学习(D-RR-QL)算法的收敛性。该算法的计算复杂度随着代理的数量线性增加。此外,它通过进行动作选择和执行的循环调度来消除环境的不稳定。该学习方案允许在协作式多智能体系统中实施模块化状态行动否决权(MSAV),从而通过否决导致行动中不希望的终止状态(UTS)的状态行动对来加快过度约束系统中的学习收敛。相关的状态动作子空间。每个代理的本地状态行为价值功能学习是一个独立的过程,包括MSAV策略。通过使用消息传递的贪婪选择过程来实现局部最优策略的协调以获得全局最优联合策略。我们展示了D-RR-QL改进了最新方法,例如范式链接多组件机器人系统(L-MCRS)控制问题中的分布式Q学习,团队Q学习和协同强化学习:软管运输任务。 L-MCRS是过度约束的系统,由于被动链接元素和主动移动机器人的相互作用而导致许多UTS。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号