首页> 美国政府科技报告 >Time-Extended Payoffs of Collectives of Autonomous Agents

Time-Extended Payoffs of Collectives of Autonomous Agents




A collective is a set of self-interested agents which try to maximize their own utilities, along with a well-defined, time-extended world utility function which rates the performance of the entire system. In this paper, we use theory of collectives to design time-extended payoff utilities for agents that are both 'aligned' with the world utility, and are 'learnable', Le., the agents can readily see how their behavior affects their utility. We show that in systems where each agent aims to optimize such payoff functions, coordination arises as a byproduct of the agents selfishly pursuing their own goals. A game theoretic analysis shows that such payoff functions have the net effect of aligning the Nash equilibrium, Pareto optimal solution and world utility optimum, thus eliminating undesirable behavior such as agents working at cross-purposes. We then apply collective-based payoff functions to the token collection in a gridworld problem where agents need to optimize the aggregate value of tokens collected across an episode of finite duration (i.e., an abstracted version of rovers on Mars collecting scientifically 'interesting' rock samples, subject to power limitations). We show that, regardless of the initial token distribution, reinforcement learning agents using collective-based payoff functions significantly outperform both 'natural' extensions of single agent algorithms and - global reinforcement learning solutions based on 'team-games'.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号