【24h】

Robust Regression via Hard Thresholding

机译:通过硬阈值进行稳健回归

获取原文

摘要

We study the problem of Robust Least Squares Regression (RLSR) where several response variables can be adversarially corrupted. More specifically, for a data matrix X ∈ R~(p×n) and an underlying model w~*, the response vector is generated as y = X~Tw~* + b where b ∈ R~n is the corruption vector supported over at most C • n coordinates. Existing exact recovery results for RLSR focus solely on L_1-penalty based convex formulations and impose relatively strict model assumptions such as requiring the corruptions b to be selected independently of X. In this work, we study a simple hard-thresholding algorithm called Torrent which, under mild conditions on X, can recover w~* exactly even if b corrupts the response variables in an adversarial manner, i.e. both the support and entries of b are selected adversarially after observing X and w~*. Our results hold under deterministic assumptions which are satisfied if X is sampled from any sub-Gaussian distribution. Finally unlike existing results that apply only to a fixed w~*, generated independently of X, our results are universal and hold for any w~* ∈ R~p. Next, we propose gradient descent-based extensions of Torrent that can scale efficiently to large scale problems, such as high dimensional sparse recovery, and prove similar recovery guarantees for these extensions. Empirically we find TORRENT, and more so its extensions, offering significantly faster recovery than the state-of-the-art L_1 solvers. For instance, even on moderate-sized datasets (with p = 50K) with around 40% corrupted responses, a variant of our proposed method called TORRENT-HYB is more than 20× faster than the best L_1 solver.
机译:我们研究了鲁棒最小二乘回归(RLSR)的问题,其中几个响应变量可能会受到对抗性破坏。更具体地说,对于数据矩阵X∈R〜(p×n)和基础模型w〜*,响应向量生成为y = X〜Tw〜* + b,其中b∈R〜n是受支持的破坏向量最多C•n个坐标。 RLSR的现有精确恢复结果仅着眼于基于L_1-惩罚的凸公式,并施加了相对严格的模型假设,例如要求独立于X选择损坏b。在这项工作中,我们研究了一种称为Torrent的简单硬阈值算法,在X的温和条件下,即使b以对抗的方式破坏了响应变量,也可以精确地恢复w〜*,即在观察X和w〜*之后,对抗性地选择b的支持和输入。我们的结果是在确定性假设下得出的,如果从任何次高斯分布中采样X,则可以满足。最后,与仅适用于固定的w〜*且独立于X生成的现有结果不同,我们的结果是通用的,并且适用于任何w〜*∈R〜p。接下来,我们提出基于Torrent梯度下降的扩展,可以有效地扩展到诸如高维稀疏恢复之类的大规模问题,并证明这些扩展具有类似的恢复保证。从经验上讲,我们发现TORRENT以及它的扩展更能提供比最新的L_1求解器更快的恢复速度。例如,即使在响应大小大约为40%的中等大小的数据集(p = 50K)上,我们提出的方法的一种称为TORRENT-HYB的变体也比最佳L_1求解器快20倍以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号