...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Online Learning with Vector Costs and Bandits with Knapsacks
【24h】

Online Learning with Vector Costs and Bandits with Knapsacks

机译:在线学习,矢量成本和带背包的匪徒

获取原文
           

摘要

We introduce online learning with vector costs ($OLVC_p$) where in each time step $t in {1,ldots, T}$, we need to play an action $i in {1,ldots,n}$ that incurs an unknown vector cost in $[0,1]^d$. The goal of the online algorithm is to minimize the $ell_p$ norm of the sum of its cost vectors. This captures the classical online learning setting for $d=1$, and is interesting for general $d$ because of applications like online scheduling where we want to balance the load between different machines (dimensions). We study $OLVC_p$ in both stochastic and adversarial arrival settings, and give a general procedure to reduce the problem from $d$ dimensions to a single dimension. This allows us to use classical online learning algorithms in both full and bandit feedback models to obtain (near) optimal results. In particular, we obtain a single algorithm (up to the choice of learning rate) that gives sublinear regret for stochastic arrivals and a tight $O(min{p, log d})$ competitive ratio for adversarial arrivals. The $OLVC_p$ problem also occurs as a natural subproblem when trying to solve the popular Bandits with Knapsacks (BWK) problem. This connection allows us to use our $OLVC_p$ techniques to obtain (near) optimal results for BWK in both stochastic and adversarial settings. In particular, we obtain a tight $O(log d cdot log T)$ competitive ratio algorithm for adversarial BWK, which improves over the $O(d cdot log T)$ competitive ratio algorithm of Immorlica et al. (2019).
机译:我们在每次步骤$ t in {1, ldots,t } $中介绍在线学习($ olvc_p $),我们需要播放一个动作$ i in {1, ldots, N } $在$ [0,1] ^ d $中出现未知的向量成本。在线算法的目标是最小化其成本向量总和的$ ell_p $ norm。这捕获了$ d = 1 $的古典在线学习设置,并且普通的$ d $是有趣的,因为在线调度等应用程序,我们想要平衡不同机器(尺寸)之间的负载。我们在随机和对冲抵达设置中读取$ OLVC_P $,并提供一般程序,以将问题从$ D $维度降至单个维度。这允许我们在完整和强盗反馈模型中使用经典的在线学习算法,以获得(近)最佳结果。特别是,我们获得单一算法(最多可以选择学习率),为随机抵达的索布林感到遗憾,并且对抗性抵达的竞争比率紧密o( min {p, log d })。在尝试用背包(BWK)问题时,$ OLVC_P $ HOSK也会作为自然的子问题发生。此连接允许我们使用$ OLVC_P $技术在随机和对抗的环境中获得(近)BWK的最佳结果。特别是,我们获得了一个紧密的O $ O( log d cdot log t)$竞争比率算法,用于对抗的bwk,其改善了Immorlica等人的$ o(d cdot log t)$竞争比算法。 (2019)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号