【24h】

HogRider: Champion Agent of Microsoft Malmo Collaborative AI Challenge

机译:ho哥rider: champion agent of Microsoft malmo collaborative AI challenge

获取原文

摘要

It has been an open challenge for self-interested agents to make optimal sequential decisions in complex multiagent systems, where agents might achieve higher utility via collaboration. The Microsoft Malmo Collaborative AI Challenge (MCAC), which is designed to encourage research relating to various problems in Collaborative AI, takes the form of a Minecraft mini-game where players might work together to catch a pig or deviate from cooperation, for pursuing high scores to win the challenge. Various characteristics, such as complex interactions among agents, uncertainties, sequential decision making and limited learning trials all make it extremely challenging to find effective strategies. We present HogRider - the champion agent of MCAC in 2017 out of 81 teams from 26 countries. One key innovation of HogRider is a generalized agent type hypothesis framework to identify the behavior model of the other agents, which is demonstrated to be robust to observation uncertainty. On top of that, a second key innovation is a novel Q-learning approach to learn effective policies against each type of the collaborating agents. Various ideas are proposed to adapt traditional Q-learning to handle complexities in the challenge, including state-action abstraction to reduce problem scale, a warm start approach using human reasoning for addressing limited learning trials, and an active greedy strategy to balance exploitation-exploration. Challenge results show that HogRider outperforms all the other teams by a significant edge, in terms of both optimality and stability.
机译:自私代理是在复杂的多中学系统中做出最佳顺序决策的开放挑战,其中代理可以通过协作实现更高的实用程序。 Microsoft Malmo协作AI挑战(MCAC),旨在鼓励与合作AI的各种问题有关的研究,采用Minecraft Mini-Game的形式,其中玩家可以共同努力捕捉猪或偏离合作,以追求高分数赢得挑战。各种特征,如代理商之间的复杂相互作用,不确定性,顺序决策和有限的学习试验都使得找到有效策略极具挑战性。我们在26个国家的81支球队中展示了Hoogrider - MCAC的冠军代理。 Hoogrider的一个关键创新是广义代理类型假设框架,以识别其他代理的行为模型,这被证明是对观察不确定性的鲁棒性。最重要的是,第二个关键创新是一种新的Q学习方法,可以学习针对每种类型的协作剂的有效政策。提出各种想法来适应传统的Q-Leach,以处理挑战中的复杂性,包括国家行动抽象来减少问题规模,利用人类推理解决有限的学习试验,以及积极的贪婪策略来平衡利用探索的热情开始方法。挑战结果表明,在最优性和稳定性方面,亨格车以重大边缘优于所有其他球队。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号