首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Best Arm Identification in Linear Bandits with Linear Dimension Dependency
【24h】

Best Arm Identification in Linear Bandits with Linear Dimension Dependency

机译:具有线性尺寸相关性的线性强盗中的最佳手臂识别

获取原文
           

摘要

We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $heta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $heta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.
机译:我们研究了线性强盗中的最佳手臂识别问题,其中每个手臂的平均奖励线性地取决于未知的$ d $维参数向量$ theta $,目标是识别具有最大预期奖励的手臂。我们首先根据最优$ G $分配实验设计问题的凸松弛解,设计并分析了一种新颖的随机$ theta估计量。使用该估计器,我们描述了一种算法,其样本复杂度线性地取决于维数$ d $,以及一种算法,其样本复杂度取决于最佳$ d $臂的奖励缺口,与普通顶部产生的下限相匹配手臂识别问题。最后,我们从样本复杂度和计算时间两方面比较了我们的算法与其他最新算法的经验性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号