首页> 外文OA文献 >Applying machine learning techniques to an imperfect information game
【2h】

Applying machine learning techniques to an imperfect information game

机译:将机器学习技术应用于不完美的信息游戏

摘要

The game of poker presents a challenging game to Artificial Intelligence researchers because it is a complex asymmetric information game. In such games, a player can improve his performance by inferring the private information held by the other players from their prior actions. A novel connectionist structure was designed to play a version of poker (multi-player limit Hold‟em). This allows simple reinforcement learning techniques to be used which previously not been considered for the game of multi-player hold‟em. A related hidden Markov model was designed to be fitted to records of poker play without using any private information. Belief vectors generated by this model provide a more convenient and flexible representation of an opponent‟s action history than alternative approaches.ududThe structure was tested in two settings. Firstly self-play simulation was used to generate an approximation to a Nash equilibrium strategy. A related, but slower, rollout strategy that uses Monte-Carlo samples was used to evaluate the performance. Secondly the structure was used to model and hence exploit a population of opponents within a relatively small number of games. When and how to adapt quickly to new opponents are open questions in poker AI research. A opponent model with a small number of discrete types is used to identify the largest differences in strategy between members of the population. A commercial software package (Poker Academy) was used to provide a population of sophisticated opponents to test against. A series of experiments was conducted to compare adaptive and static systems. All systems showed positive results but surprisingly the adaptive systems did not show a significant improvement over similar static systems. The possible reasons for this result are discussed.ududThis work formed the basis of a series of entries to the computer poker competition hosted at the annual conferences of the Association for the Advancement of Artificial Intelligence (AAAI). Its best rankings were 3rd in the 2006 6-player limit hold‟em competition and 2nd in the 2008 3-player limit hold‟em competition.
机译:扑克游戏给人工智能研究人员带来了挑战,因为它是一个复杂的不对称信息游戏。在这样的游戏中,玩家可以通过从其他玩家的先前动作中推断其他玩家持有的私人信息来提高自己的表现。设计了一种新颖的连接主义结构来玩某种版本的扑克(多玩家限注德州扑克)。这允许使用简单的强化学习技术,而以前在多玩家扑克游戏中并未考虑这种学习技术。设计了一个相关的隐式马尔可夫模型,以适合扑克游戏记录,而无需使用任何私人信息。与其他方法相比,此模型生成的置信向量可以更方便,灵活地表示对手的行动历史。 ud ud在两种设置下对结构进行了测试。首先,使用自演模拟来生成纳什均衡策略的近似值。使用蒙特卡洛样本的相关但较慢的推出策略用于评估性能。其次,该结构用于建模,因此可以在相对较少的游戏中利用大量对手。何时以及如何快速适应新对手是扑克AI研究中的未解决问题。具有少量离散类型的对手模型用于识别总体成员之间策略上的最大差异。商业软件包(扑克学院)用于提供大量经验丰富的对手进行测试。进行了一系列实验以比较自适应系统和静态系统。所有系统都显示出了积极的结果,但是令人惊讶的是,自适应系统并未显示出比类似静态系统明显的改进。讨论了产生此结果的可能原因。 ud ud这项工作构成了在人工智能促进协会(AAAI)年度会议上主办的一系列计算机扑克比赛的基础。其最佳排名是在2006年的6人有限德州扑克比赛中排名第3和在2008年的3人有限德州扑克比赛中排名第2。

著录项

  • 作者

    Sweeney Néill;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号