...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates
【24h】

Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates

机译:Minimax禁止惩罚惩罚多武装强盗模型,具有高维协调因子

获取原文

摘要

In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(log T), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s^2 (s + log d). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.
机译:在本文中,我们提出了一种最小凹陷的多武装强盗(MCP-Biblit)算法,用于在在线学习和决策过程中具有潜在稀疏结构的高维数据的决策者。我们证明了MCP-Biberit算法渐近地实现了样本大小T,O(LOG T)的最佳累积遗憾,进一步达到了协变量维度D的更紧密的界限和显着的协变量S,O(S ^ 2( S + log d)。此外,我们开发了线性近似方法,2步加权套索程序,以识别非IID样本下MCP-Birtit算法的MCP估计。使用此过程,MCP估计器匹配Oracle估算器具有很高的概率。最后,我们提出了两个实验,将我们提出的MCP-Birtit算法基准测试到其他强盗算法。这两个实验表明,MCP-Birtit算法在其他基准算法上表现出有利地,特别是当存在高水平时数据稀疏性或样本大小不会太小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号