【24h】

Scaling Up Game Theory: Achievable Set Methods for Efficiently Solving Stochastic Games of Complete and Incomplete Information

机译:扩大博弈论:有效解决完全和不完全信息随机游戏的可行集方法

获取原文

摘要

Multi-agent reinforcement learning (MARL) poses the same learning problem as traditional reinforcement leaning (RL): How can an agent learn to maximize their rewards through interaction with their environment? Traditional RL has formalized this problem by modeling the environment as a Markov decision process (MDP) where the outcome of our agent's actions are fully explained by the state the world is in. MDPs work well for modeling simple dynamic systems but have a hard time modeling the near boundless complexity of the real world. A particularly interesting class of environments that MDPs model poorly can be understood, instead, by modeling them as a relatively simple process, but with complex dynamics attributed to the presence of other agents who are also attempting to maximize their rewards. MARL addresses the RL problem in these environments using stochastic (Markov) games (Littman 1994) as the formal model.
机译:多主体强化学习(MARL)提出了与传统强化学习(RL)相同的学习问题:代理商如何通过与环境的互动来学习以最大化其报酬?传统的RL通过将环境建模为马尔可夫决策过程(MDP)来形式化此问题,在该过程中,我们的特工行动的结果已由世界所处的状态完全解释。MDP可以很好地建模简单的动态系统,但很难建模现实世界几乎无限的复杂性。相反,通过将MDP建模为相对简单的过程,可以理解MDP建模不佳的一类特别有趣的环境,但是由于其他代理也在尝试获得最大回报,因此存在复杂的动态过程。 MARL使用随机(Markov)游戏(Littman 1994)作为正式模型解决了这些环境中的RL问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号