首页> 外文学位 >Multiagent learning in the presence of agents with limitations.
【24h】

Multiagent learning in the presence of agents with limitations.

机译:在存在局限性的代理的情况下进行多代理学习。

获取原文
获取原文并翻译 | 示例

摘要

Learning to act in a multiagent environment is a challenging problem. Optimal behavior for one agent depends upon the behavior of the other agents, which are learning as well. Multiagent environments are therefore non-stationary, violating the traditional assumption underlying single-agent learning. In addition, agents in complex tasks may have limitations, such as physical constraints or designer-imposed approximations of the task that make learning tractable. Limitations prevent agents from acting optimally, which complicates the already challenging problem. A learning agent must effectively compensate for its own limitations while exploiting the limitations of the other agents. My thesis research focuses on these two challenges, namely multiagent learning and limitations, and includes four main contributions.; First, the thesis introduces the novel concepts of a variable learning rate and the WoLF (Win or Learn Fast) principle to account for other learning agents. The WoLF principle is capable of making rational learning algorithms converge to optimal policies, and by doing so achieves two properties, rationality and convergence, which had not been achieved by previous techniques. The converging effect of WoLF is proven for a class of matrix games, and demonstrated empirically for a wide-range of stochastic games.; Second, the thesis contributes an analysis of the effect of limitations on the game-theoretic concept of Nash equilibria. The existence of equilibria is important if multiagent learning techniques, which often depend on the concept, are to be applied to realistic problems where limitations are unavoidable. The thesis introduces a general model for the effect of limitations on agent behavior, which is used to analyze the resulting impact on equilibria. The thesis shows that equilibria do exist for a few restricted classes of games and limitations, but even well-behaved limitations do not preserve the existence of equilibria, in general.; Third, the thesis introduces GraWoLF, a general-purpose, scalable, multiagent learning algorithm. GraWoLF combines policy gradient learning techniques with the WoLF variable learning rate. The effectiveness of the learning algorithm is demonstrated in both a card game with an intractably large state space, and an adversarial robot task. These two tasks are complex and agent limitations are prevalent in both.; Fourth, the thesis describes the CMDragons robot soccer team strategy for adapting to an unknown opponent. (Abstract shortened by UMI.)
机译:学会在多主体环境中行动是一个具有挑战性的问题。一个代理的最佳行为取决于其他正在学习的代理的行为。因此,多主体环境是不稳定的,违反了基于单主体学习的传统假设。此外,复杂任务中的主体可能具有局限性,例如物理约束或设计者对任务的逼近,使学习变得容易。局限性使代理无法发挥最佳作用,这使本已具有挑战性的问题变得更加复杂。学习代理必须在利用其他代理的局限性的同时,有效地弥补自身的局限性。本文的研究集中在这两个挑战上,即多主体学习和局限性,并包括四个主要方面。首先,论文介绍了可变学习率的新概念和WoLF(获胜或快速学习)原理来说明其他学习主体。 WoLF原理能够使理性学习算法收敛于最优策略,并且这样做可以实现两个属性,即理性和收敛,这是先前技术无法实现的。 WoLF的收敛效果已在一类矩阵博弈中得到证明,并在广泛的随机博弈中得到了经验证明。其次,本文有助于分析限制对纳什均衡博弈论概念的影响。如果多智能体学习技术(通常取决于概念)要用于不可避免的局限性现实问题,那么平衡的存在就很重要。本文介绍了一个局限性对行为主体行为影响的通用模型,该模型用于分析由此产生的对均衡性的影响。论文表明,均衡对某些局限性的博弈和局限性确实存在,但即使行为良好的局限性也不能总体上保持均衡性的存在。第三,论文介绍了GraWoLF,一种通用的,可扩展的多主体学习算法。 GraWoLF将策略梯度学习技术与WoLF可变学习率结合在一起。在具有难以置信的大状态空间的纸牌游戏和对抗性机器人任务中都证明了学习算法的有效性。这两个任务很复杂,并且代理限制都普遍存在。第四,论文描述了CMDragons机器人足球队适应未知对手的策略。 (摘要由UMI缩短。)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号