Many applications require optimizing an unknown, noisy function that isexpensive to evaluate. We formalize this task as a multi-armed bandit problem,where the payoff function is either sampled from a Gaussian process (GP) or haslow RKHS norm. We resolve the important open problem of deriving regret boundsfor this setting, which imply novel convergence rates for GP optimization. Weanalyze GP-UCB, an intuitive upper-confidence based algorithm, and bound itscumulative regret in terms of maximal information gain, establishing a novelconnection between GP optimization and experimental design. Moreover, bybounding the latter in terms of operator spectra, we obtain explicit sublinearregret bounds for many commonly used covariance functions. In some importantcases, our bounds have surprisingly weak dependence on the dimensionality. Inour experiments on real sensor data, GP-UCB compares favorably with otherheuristical GP optimization approaches.
展开▼