Stochastic environments pose challenging problems for agents trying to act optimally in the presence of other agents. In such environments agents have to contend with the probabilistic effects of other agents' actions, their inability tocompletely observe the state of the world before selecting the next action and in some cases the high cost of communication. In this paper, we show how such systems can be modeled as multi-agent Markov decison processes. We describe a policy thatprescribes an action that has a high probability of being the optimal action under a given global state distribution and present an algorithm that agents can use to act in such environments while attempting to achieve their goals.
展开▼