We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $heta$ is larger than the sampling budget $n$. In such cases, it is in general impossible to derive sub-linear regret bounds since usual linear bandit algorithms have a regret in $O(Ksqrt{n})$. In this paper we assume that $heta$ is $S-$sparse, i.e.~has at most $S-$non-zero components, and that the space of arms is the unit ball for the $||.||_2$ norm. We combine ideas from Compressed Sensing and Bandit Theory and derive algorithms with regret bounds in $O(Ssqrt{n})$.
展开▼