Stochastic local search (SLS), like many other stochastic optimization algorithms, has several parameters that need to be optimized in order for the algorithm to find high quality solutions within a short amount of time. In this paper, we formulate a stochastic local search bandit (SLSB), which is a novel learning variant of SLS based on multi-armed bandits. SLSB optimizes SLS over a sequence of stochastic optimization problems and achieves high average cumulative reward. In SLSB, we study how SLS can be optimized via low degree polynomials in its noise and restart parameters. To determine the coefficients of the polynomials, we present polynomial Thompson Sampling (PolyTS). We derive a regret bound for PolyTS and validate its performance on synthetic problems of varying difficulty as well as on feature selection problems. Compared to bandits with no assumptions of the reward function and other parameter optimization approaches, our PolyTS assuming polynomial structure can provide substantially better parameter optimization for SLS. In addition, due to its simple model update, PolyTS has low computational cost compared to other SLS parameter optimization methods.
展开▼