首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Exponential Weights on the Hypercube in Polynomial Time
【24h】

Exponential Weights on the Hypercube in Polynomial Time

机译:多项式时间内超立方体上的指数权重

获取原文
           

摘要

We study a general online linear optimization problem(OLO). At each round, a subset of objects from a fixed universe of $n$ objects is chosen, and a linear cost associated with the chosen subset is incurred. To measure the performance of our algorithms, we use the notion of regret which is the difference between the total cost incurred over all iterations and the cost of the best fixed subset in hindsight. We consider Full Information and Bandit feedback for this problem. This problem is equivalent to OLO on the ${0,1}^n$ hypercube. The Exp2 algorithm and its bandit variant are commonly used strategies for this problem. It was previously unknown if it is possible to run Exp2 on the hypercube in polynomial time. In this paper, we present a polynomial time algorithm called PolyExp for OLO on the hypercube. We show that our algorithm is equivalent Exp2 on ${0,1}^n$, Online Mirror Descent(OMD), Follow The Regularized Leader(FTRL) and Follow The Perturbed Leader(FTPL) algorithms. We show PolyExp achieves expected regret bound that is a factor of $sqrt{n}$ better than Exp2 in the full information setting under $L_infty$ adversarial losses. Because of the equivalence of these algorithms, this implies an improvement on Exp2’s regret bound in full information. We also show matching regret lower bounds. Finally, we show how to use PolyExp on the ${-1,+1}^n$ hypercube, solving an open problem in Bubeck et al (COLT 2012).
机译:我们研究了一个一般的在线线性优化问题(OLO)。在每个回合中,从$ n $个对象的固定宇宙中选择一个对象子集,并产生与所选子集相关的线性成本。为了衡量我们算法的性能,我们使用遗憾的概念,即所有迭代中产生的总成本与事后看来最佳固定子集的成本之间的差。我们考虑此问题的完整信息和强盗反馈。此问题等效于$ {0,1 } ^ n $超立方体上的OLO。 Exp2算法及其强盗变体是解决此问题的常用策略。以前未知是否可以在多项式时间内在超立方体上运行Exp2。在本文中,我们为超立方体上的OLO提出了一种称为PolyExp的多项式时间算法。我们证明了我们的算法在$ {0,1 } ^ n $,在线镜像下降(OMD),遵循正则化领导者(FTRL)和遵循扰动领导者(FTPL)算法上等效于Exp2。我们显示,PolyExp在$ L_ infty $对抗损失下的完整信息环境中达到了预期的后悔界限,这比Exp2好$ sqrt {n} $。由于这些算法的等效性,这意味着Exp2在完整信息范围内的遗憾得到了改善。我们还显示了匹配的遗憾下限。最后,我们展示了如何在$ {-1,+ 1 } ^ n $超立方体上使用PolyExp,解决了Bubeck等人(COLT 2012)中的一个开放问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号