We advance the field of research involving modeling opponents in interesting adversarial environments: environments in which equilibrium strategies are intractable to calculate or undesirable to use. We motivate the need for opponent models in such environments by showing how successful opponent modeling agents can exploit nonequilibrium strategies and strategies using equilibrium approximations. We develop a new, flexible measurement which can be used to quantify how well our model can predict the opponent's behavior independently from the performance of the agent in which it resides. We show how this metric can be used to find areas of model improvement that would otherwise have remained undiscovered and demonstrate the technique for evaluating opponent model quality in the poker domain. We introduce the idea of performance bounds for classes of opponent models, present a method for calculating them, and show how these bounds are a function of only the environment and thus invariant over the set of all opponents an agent may face. We calculate the performance bounds for several classes of models in two domains: high card draw with simultaneous betting and a new simultaneous-move strategy game we develop. We describe how the performance bounds can be used to guide selection of appropriate classes of models for a given domain as well as guide the level of effort that should be applied to developing opponent models in those domains. We expand the set of opponent modeling methods with new algorithms and study their performance empirically in several domains, including full scale Texas Hold em poker. We develop PokeMinn, a top-performing agent that learns to improve its performance by observing the opponent, even when the opponent is attempting to approximate equilibrium play. These methods also pave the way for performance optimization using genetic algorithms and efficient model queries using metareasoning.
展开▼