首页> 外文学位 >CLEAN Learning to Improve Coordination and Scalability in Multiagent Systems.
【24h】

CLEAN Learning to Improve Coordination and Scalability in Multiagent Systems.

机译:清洁学习,以提高多代理系统中的协调性和可伸缩性。

获取原文
获取原文并翻译 | 示例

摘要

Recent advances in multiagent learning have led to exciting new capabilities spanning fields as diverse as planetary exploration, air traffic control, military reconnaissance, and airport security. Such algorithms provide a tangible benefit over traditional control algorithms in that they allow fast responses, adapt to dynamic environments, and generally scale well. Unfortunately, because many existing multiagent learning methods are extensions of single agent approaches, they are inhibited by three key issues: i) they treat the actions of other agents as "environmental noise" in an attempt to simplify the problem complexity, ii) they are slow to converge in large systems as the joint action space grows exponentially in the number of agents, and iii) they frequently rely upon the presence of an accurate system model being readily available.;This work addresses these three issues sequentially. First, we improve overall learning performance compared to existing state-of-the-art techniques in the field by embracing the exploration in learning rather than ignoring it or approximating it away. Within multiagent systems, exploration by individual agents significantly alters the dynamics of the environment in which all agents learn. To address this, we introduce the concept of "private" exploration, which enables each agent to present a stationary baseline policy to other agents in order to allow other agents in the system to learn more efficiently. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards which improve coordination and performance by utilizing the concept of private exploration in order to remove the negative impact of traditional "public" exploration strategies from learning in multiagent systems. Next, we leverage the fundamental properties of CLEAN rewards that enable private exploration to allow agents to explore multiple potential actions concurrently in a "batch mode" in order to significantly improve learning speed over the state-of-the-art. Finally, we improve the real-world applicability of the proposed techniques by reducing their requirements. Specifically, the CLEAN rewards developed require an accurate partial model (i.e., an accurate model of the system objective) of the system in order to be computed. Unfortunately, many real-world systems are too complex to be modeled or are not known in advance, so an accurate system model is not available a priori. We address this shortcoming by employing model-based reinforcement learning techniques to enable agents to construct their own approximate model of the system objective based upon their observations and use this approximate model to calculate their CLEAN rewards.
机译:多主体学习的最新进展带来了令人兴奋的新功能,涵盖了行星探索,空中交通管制,军事侦察和机场安全等各个领域。这样的算法与传统的控制算法相比,具有明显的优势,因为它们允许快速响应,适应动态环境并通常具有良好的伸缩性。不幸的是,由于许多现有的多主体学习方法是单主体方法的扩展,因此受到三个关键问题的抑制:i)他们将其他主体的行为视为“环境噪声”,以简化问题的复杂性; ii)随着联合行动空间的代理人数呈指数增长,在大型系统中收敛缓慢;并且iii)他们经常依赖于容易获得的精确系统模型的存在。该工作依次解决了这三个问题。首先,与现场现有的最先进技术相比,我们通过在学习中进行探索而不是忽略或逼近来提高整体学习表现。在多主体系统中,单个主体的探索会极大地改变所有主体学习的环境的动态。为了解决这个问题,我们引入了“私有”探索的概念,该概念使每个代理可以向其他代理呈现固定的基线策略,以允许系统中的其他代理更有效地学习。特别是,我们引入了无探索行为噪声的协作学习(CLEAN)奖励,该奖励通过利用私人探索​​的概念来改善协调和性能,从而消除了传统的“公共”探索策略对多主体系统学习的负面影响。接下来,我们利用CLEAN奖励的基本属性,这些属性使私人探索能够允许代理以“批处理模式”同时探索多个潜在动作,从而大大提高了最新技术的学习速度。最后,我们通过减少对技术的要求来提高其实用性。具体而言,开发的CLEAN奖励需要系统的准确的部分模型(即,系统目标的准确模型)以便进行计算。不幸的是,许多现实世界的系统太复杂而无法建模或事先未知,因此,先验无法获得准确的系统模型。我们通过采用基于模型的强化学习技术来解决此缺点,以使代理能够根据他们的观察结果构建自己的系统目标近似模型,并使用该近似模型来计算其CLEAN奖励。

著录项

  • 作者

    HolmesParker, Chris.;

  • 作者单位

    Oregon State University.;

  • 授予单位 Oregon State University.;
  • 学科 Engineering Mechanical.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 161 p.
  • 总页数 161
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号