Partial-monitoring games constitute a mathematical framework for sequentialdecision making problems with imperfect feedback: The learner repeatedlychooses an action, opponent responds with an outcome, and then the learnersuffers a loss and receives a feedback signal, both of which are fixedfunctions of the action and the outcome. The goal of the learner is to minimizehis total cumulative loss. We make progress towards the classification of thesegames based on their minimax expected regret. Namely, we classify almost allgames with two outcomes and finite number of actions: We show that theirminimax expected regret is either zero, $widetilde{Theta}(sqrt{T})$,$Theta(T^{2/3})$, or $Theta(T)$ and we give a simple and efficientlycomputable classification of these four classes of games. Our hope is that theresult can serve as a stepping stone toward classifying all finitepartial-monitoring games.
展开▼