首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization
【24h】

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

机译:后悔的下界,用于噪声高斯过程的强盗优化

获取原文
           

摘要

In this paper, we consider the problem of sequentially optimizing a black-box function $f$ based on noisy samples and bandit feedback. We assume that $f$ is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret, measuring the sum of regrets over the $T$ chosen points. For the isotropic squared-exponential kernel in $d$ dimensions, we find that an average simple regret of $ε$ requires $T = Ωig(rac1ε^2 (lograc1ε)^d/2ig)$, and the average cumulative regret is at least $Ωig( sqrt{T}(log T)^d ig)$, thus matching existing upper bounds up to the replacement of $d/2$ by $d+O(1)$ in both cases. For the Matérn-$ν$ kernel, we give analogous bounds of the form $Ωig( (rac1ε)^2+d/νig)$ and $Ωig( T^racν+ d2ν+ d ig)$, and discuss the resulting gaps to the existing upper bounds.
机译:在本文中,我们考虑了基于噪声样本和强盗反馈对黑盒函数$ f $进行顺序优化的问题。我们假设$ f $在某些再现内核Hilbert空间(RKHS)中具有有界范数的意义上是平滑的,从而产生了通常认为的非贝叶斯形式的高斯过程匪徒优化。我们为简单后悔,测量$ T $回合后报告的单个点的次优性以及累积后悔,测量对所选T $$点的后悔总和提供了算法独立下界。对于$ d $维的各向同性平方指数内核,我们发现$ε$的平均简单后悔需要$ T =Ω big(rac1ε^ 2( log rac1ε)^ d / 2 big)$ ,并且平均累积后悔至少为$Ω big( sqrt {T}( log T)^ d big)$,因此匹配现有上限,直至将$ d / 2 $替换为$ d +在两种情况下均为O(1)$。对于Matérn-$ν$内核,我们以$Ω big((rac1ε)^ 2 + d /ν big)$和$Ω big(T ^ racν+d2ν+ d big)$,并讨论与现有上限之间的差距。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号