首页> 外文会议>International Conference on Algorithmic Learning Theory >An Efficient Algorithm for Learning with Semi-bandit Feedback
【24h】

An Efficient Algorithm for Learning with Semi-bandit Feedback

机译:用半燃烧反馈学习一种有效的算法

获取原文

摘要

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m{the square root of}(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m~(3/2){the square root of}(T log d)), gaining a factor of {the square root of}(d/m) over previous bounds for this algorithm.
机译:我们认为在线组合优化下半土匪反馈的问题。学习者的目标是按顺序从组合决策集,以减少其累积损失选择它的行动。我们提出了一个学习算法的基础上,后续的改动的负责人(FPL)预测法称为几何重采样(GR)一种新颖的损失估计过程结合这个问题。相反,以前的解决方案,得到的算法可以有效地对任何决策集合,其中高效离线组合优化是不可能的实现的。假设决定集合的元素可以用d维二进制矢量,至多m个非零项地描述,我们表明,我们的算法的Ť几轮后的预期遗憾的是O(米{的平方根}(DT日志d))。作为一个方面因此,我们也改善FPL最著名遗憾界在全部信息设置到O(米〜(3/2){的平方根}(T日志d)),获得{的平方的倍数的}(d / m)的比以前的界限为这个算法根。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号