An Efficient Algorithm for Learning with Semi-bandit Feedback

机译：用半燃烧反馈学习一种有效的算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m{the square root of}(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m~(3/2){the square root of}(T log d)), gaining a factor of {the square root of}(d/m) over previous bounds for this algorithm.

机译：我们认为在线组合优化下半土匪反馈的问题。学习者的目标是按顺序从组合决策集，以减少其累积损失选择它的行动。我们提出了一个学习算法的基础上，后续的改动的负责人（FPL）预测法称为几何重采样（GR）一种新颖的损失估计过程结合这个问题。相反，以前的解决方案，得到的算法可以有效地对任何决策集合，其中高效离线组合优化是不可能的实现的。假设决定集合的元素可以用d维二进制矢量，至多m个非零项地描述，我们表明，我们的算法的Ť几轮后的预期遗憾的是O（米{的平方根}（DT日志d））。作为一个方面因此，我们也改善FPL最著名遗憾界在全部信息设置到O（米〜（3/2）{的平方根}（T日志d）），获得{的平方的倍数的}（d / m）的比以前的界限为这个算法根。

著录项

来源
《International Conference on Algorithmic Learning Theory》|2013年||共15页
会议地点
作者
Gergely Neu; Gabor Bartok;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.6-53;
关键词
Follow-the-perturbed-leader; Bandit problems; Online learning; Combinatorial optimization;

机译：跟随扰动的领导者;匪徒问题;在线学习;组合优化;

相似文献

外文文献
中文文献
专利

1. Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits [J] . Thibaut Cuvelier, Richard Combes, Eric Gourdin Performance evaluation review . 2021,第1期

机译：组合半刺槐的统计有效，多项式时间算法
2. A Better Resource Allocation Algorithm with Semi-Bandit Feedback [J] . Yuvan Dagan, Crammer Koby JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：具有半燃烧反馈的更好资源分配算法
3. Efficient task assignment for spatial crowdsourcing: A combinatorial fractional optimization approach with semi-bandit learning [J] . ul Hassan Umair, Curry Edward Expert Systems with Application . 2016,第Octa期

机译：用于空间众包的高效任务分配：具有半强学习的组合分数优化方法
4. An Efficient Algorithm for Learning with Semi-bandit Feedback [C] . Gergely Neu, Gabor Bartok International conference on algorithmic learning theory . 2013

机译：一种半强反馈的高效学习算法
5. Sample-Efficient Nonconvex Optimization Algorithms in Machine Learning and Reinforcement Learning [D] . Xu, Pan. 2021

机译：机器学习和加固学习中的采样高效的非透露算法
6. An Efficient Sampling-Based Algorithms Using Active Learning and Manifold Learning for Multiple Unmanned Aerial Vehicle Task Allocation under Uncertainty [O] . Xiaowei Fu, Hui Wang, Bin Li, 2018

机译：不确定性下基于主动学习和流形学习的高效采样算法用于多种无人机任务分配
7. An efficient algorithm for learning with semi-bandit feedback [O] . Neu, Gergely, Bartók, Gábor 2013

机译：一种有效的半匪反馈学习算法

An Efficient Algorithm for Learning with Semi-bandit Feedback

摘要

著录项

相似文献

相关主题

期刊订阅