首页> 外文OA文献 >Multiagent Learning with Bargaining - A Game Theoretic Approach
【2h】

Multiagent Learning with Bargaining - A Game Theoretic Approach

机译:讨价还价的多主体学习-一种博弈论方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Learning in the real world occurs when an agent, which perceives its current state and takes actions, interacts with the environment, which in return provides a positive or negative feedback. The field of reinforcement learning studies such processes and attempts to find policies that map states of the world to the actions of agents in order to maximize cumulative reward over the long run. In multi-agent systems, agent learning becomes more challenging, since the optimal action of each agent generally depends on the actions of other agents. Most studies in multiagent learning research employ non-cooperative equilibrium as a learning objective. However, in many situations, the equilibrium gives worse payoffs to both players than their payoffs would be in the case of cooperation. Therefore the agents have strong desire to choose a cooperative solution instead of the non-cooperative equilibrium. In this work, we apply the Nash Bargaining Solution (NBS) to multi-agent systems with unknown parameters and design a multiagent learning algorithm based on bargaining, in which the agents can reach the NBS by learning through experience. We show that the solution is unique and is Pareto-optimal. We also prove theoretically that the algorithm converges. In addition, we extend the work to multi-agent systems with asymmetric agents having different powers in decision making and design a multiagent learning algorithm with asymmetric bargaining. To evaluate these learning algorithms and compare with the existing learning algorithms, the benchmark, grid world games, are adopted as the simulation test-bed. The simulation results demonstrate that our learning algorithms converge to the unique Pareto-optimal solution and the convergence is faster in comparison to the existing multiagent learning algorithms. Finally, we discuss an application of multiagent learning algorithms to a classic economic model, which is known as oligopoly.
机译:当一个能感知当前状态并采取行动的行为人与环境互动时,就会在现实世界中学习,而环境反过来会提供正面或负面的反馈。强化学习领域研究这样的过程,并试图找到将世界状况映射到代理行为的策略,以便从长远来看最大化累积奖励。在多主体系统中,由于每个主体的最佳动作通常取决于其他主体的动作,因此主体学习变得更具挑战性。多主体学习研究中的大多数研究都以非合作均衡为学习目标。但是,在许多情况下,均衡给两个参与者带来的收益要比合作情况下的收益要差。因此,代理人强烈希望选择合作解决方案而不是非合作均衡。在这项工作中,我们将Nash讨价还价解决方案(NBS)应用于参数未知的多主体系统,并设计了一种基于议价的多主体学习算法,在该算法中,代理可以通过经验学习来到达NBS。我们证明该解决方案是唯一的,并且是帕累托最优的。我们还从理论上证明了算法的收敛性。另外,我们将工作扩展到具有不同决策权的不对称主体的多主体系统,并设计了具有不对称议价的多主体学习算法。为了评估这些学习算法并与现有的学习算法进行比较,将基准测试,网格世界游戏作为模拟测试平台。仿真结果表明,我们的学习算法收敛到唯一的帕累托最优解,并且与现有的多主体学习算法相比,收敛速度更快。最后,我们讨论了将多主体学习算法应用于经典经济模型(称为寡头垄断)的应用。

著录项

  • 作者

    Qiao Haiyan;

  • 作者单位
  • 年度 2007
  • 总页数
  • 原文格式 PDF
  • 正文语种 EN
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号