首页> 外文期刊>Statistics and computing >High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking
【24h】

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

机译:实际中的高维回归:有限样本预测,变量选择和排名的实证研究

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a "no panacea" view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.
机译:惩罚似然法被广泛用于高维回归。尽管已经提出了许多方法并且相关理论已经得到很好的发展,但在实践中遇到的有限样本设置中不同方法的相对功效仍未得到完全理解。因此,需要在这一领域进行经验研究,以便为用户提供实用的见识和指导。在本文中,我们提出了惩罚回归方法的大规模比较。我们区分三个相关目标:预测,变量选择和变量排名。我们的结果涵盖了2300多种数据生成场景,包括合成和半合成数据(实际协变量和模拟响应),使我们能够系统地考虑各种因素(样本大小,维度,稀疏性,信号强度和多重共线性)的影响。我们考虑了几种广泛使用的方法(套索,自适应套索,弹性网,岭回归,SCAD,Dantzig选择器和稳定性选择)。我们发现方法之间的性能差异很大。我们的结果支持“没有万能药”的观点,即使在所有数据都与方法所依据的假设完全吻合的受限环境中,也没有在所有方案或目标上获得明确胜利者的观点。这项研究允许我们针对目标和某些数据特征,针对哪种方法最(或最不适合)提出一些建议。我们的经验结果对现有理论进行了补充,并提供了一种资源来比较各种场景和指标之间的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号