Online Learning with Vector Costs and Bandits with Knapsacks

Thomas Kesselheim; Sahil Singla

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Online Learning with Vector Costs and Bandits with Knapsacks

【24h】

Online Learning with Vector Costs and Bandits with Knapsacks

机译：在线学习，矢量成本和带背包的匪徒

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce online learning with vector costs ($OLVC_p$) where in each time step $t in {1,ldots, T}$, we need to play an action $i in {1,ldots,n}$ that incurs an unknown vector cost in $[0,1]^d$. The goal of the online algorithm is to minimize the $ell_p$ norm of the sum of its cost vectors. This captures the classical online learning setting for $d=1$, and is interesting for general $d$ because of applications like online scheduling where we want to balance the load between different machines (dimensions). We study $OLVC_p$ in both stochastic and adversarial arrival settings, and give a general procedure to reduce the problem from $d$ dimensions to a single dimension. This allows us to use classical online learning algorithms in both full and bandit feedback models to obtain (near) optimal results. In particular, we obtain a single algorithm (up to the choice of learning rate) that gives sublinear regret for stochastic arrivals and a tight $O(min{p, log d})$ competitive ratio for adversarial arrivals. The $OLVC_p$ problem also occurs as a natural subproblem when trying to solve the popular Bandits with Knapsacks (BWK) problem. This connection allows us to use our $OLVC_p$ techniques to obtain (near) optimal results for BWK in both stochastic and adversarial settings. In particular, we obtain a tight $O(log d cdot log T)$ competitive ratio algorithm for adversarial BWK, which improves over the $O(d cdot log T)$ competitive ratio algorithm of Immorlica et al. (2019).

机译：我们在每次步骤$ t in {1， ldots，t } $中介绍在线学习（$ olvc_p $），我们需要播放一个动作$ i in {1， ldots， N } $在$ [0,1] ^ d $中出现未知的向量成本。在线算法的目标是最小化其成本向量总和的$ ell_p $ norm。这捕获了$ d = 1 $的古典在线学习设置，并且普通的$ d $是有趣的，因为在线调度等应用程序，我们想要平衡不同机器（尺寸）之间的负载。我们在随机和对冲抵达设置中读取$ OLVC_P $，并提供一般程序，以将问题从$ D $维度降至单个维度。这允许我们在完整和强盗反馈模型中使用经典的在线学习算法，以获得（近）最佳结果。特别是，我们获得单一算法（最多可以选择学习率），为随机抵达的索布林感到遗憾，并且对抗性抵达的竞争比率紧密o（ min {p， log d }）。在尝试用背包（BWK）问题时，$ OLVC_P $ HOSK也会作为自然的子问题发生。此连接允许我们使用$ OLVC_P $技术在随机和对抗的环境中获得（近）BWK的最佳结果。特别是，我们获得了一个紧密的O $ O（ log d cdot log t）$竞争比率算法，用于对抗的bwk，其改善了Immorlica等人的$ o（d cdot log t）$竞争比算法。（2019）。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2010期|共20页
作者
Thomas Kesselheim; Sahil Singla;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Preference-based Online Learning with Dueling Bandits: A Survey [J] . Viktor Bengs, Róbert Busa-Fekete, Adil El Mesaoudi-Paul, Journal of machine learning research . 2021,第a期

机译：基于偏好的在线学习与决斗匪徒：调查
2. New bounds on the price of bandit feedback for mistake-bounded online multiclass learning [J] . Long Philip M. Theoretical computer science . 2020,第期

机译：用于错误的在线多种单位学习的强盗反馈价格的新界限
3. Bandit Online Learning with Unknown Delays [J] . Bingcong Li, Tianyi Chen, Georgios B. Giannakis JMLR: Workshop and Conference Proceedings . 2018,第12期

机译：未知延迟的强盗在线学习
4. Online Knapsack Problem with Removal Cost [C] . Xin Han, Yasushi Kawase, Kazuhisa Makino Computing and combinatorics . 2012

机译：带搬运费用的在线背包问题
5. Efficient Online Learning with Bandit Feedback [D] . Liu, Fang. 2020

机译：高效在线学习与强盗反馈
6. The Costs of Online Learning: Examining Differences in Motivation and Academic Outcomes in Online and Face-to-Face Community College Developmental Mathematics Courses [O] . Michelle K. Francis, Stephanie V. Wormington, Chris Hulleman 1993

机译：在线学习的成本：检查在线和面对面社区大学发展数学课程的动机和学业成绩的差异
7. A note on the price of bandit feedback for mistake-bounded online learning [O] . Jesse Geneson 2021

机译：关于误区在线学习的强盗反馈价格的说明

Online Learning with Vector Costs and Bandits with Knapsacks

摘要

著录项

相似文献

相关主题

期刊订阅