Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory

Javanmard A.; Montanari A.

首页> 外文期刊>Information Theory, IEEE Transactions on >Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory

【24h】

Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory

机译：高斯随机设计模型下高维回归的假设检验：渐近理论

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider linear regression in the high-dimensional regime where the number of observations

(n)

is smaller than the number of parameters

(p)

. A very successful approach in this setting uses

(ell _{1})

-penalized least squares (also known as the Lasso) to search for a subset of

(s_{0}< n)

parameters that best explain the data, while setting the other parameters to zero. Considerable amount of work has been devoted to characterizing the estimation and model selection problems within this approach. In this paper, we consider instead the fundamental, but far less understood, question of statistical significance. More precisely, we address the problem of computing p-values for single regression coefficients. On one hand, we develop a general upper bound on the minimax power of tests with a given significance level. We show that rigorous guarantees for earlier methods do not allow to achieve this bound, except in special cases. On the other, we prove that this upper bound is (nearly) achievable through a practical procedure in the case of random design matrices with independent entries. Our approach is based on a debiasing of the Lasso estimator. The analysis builds on a rigorous characterization of the asymptotic distribution of the Lasso estimator and its debiased version. Our result holds for optimal sample size, i.e., when

(n)

is at least on the order of

(s_{0} log (p/s_{0}))

. We generalize our approach to random design matrices with independent identically distributed Gaussian rows

( bo- dsymbol {x}_{i}sim {sf N} (0, boldsymbol {Sigma }))

. In this case, we prove that a similar distributional characterization (termed standard distributional limit) holds for

(n)

much larger than

(s_{0}(log p)^{2})

. Our analysis assumes

( boldsymbol {Sigma })

is known. To cope with unknown

( boldsymbol {Sigma })

, we suggest a plug-in estimator for sparse covariances

( boldsymbol {Sigma })

and validate the method through numerical simulations. Finally, we show that for optimal sample size,

(n)

being at least of order

(s_{0} log (p/s_{0}))

, the standard distributional limit for general Gaussian designs can be derived from the replica heuristics in statistical physics. This derivation suggests a stronger conjecture than the result we prove, and near-optimality of the statistical power for a large class of Gaussian designs.

机译：我们考虑在高维状态下的线性回归，其中观察数

（n）小于该数量参数

（p）。在这种情况下，一种非常成功的方法是使用

（ell _ {1}） -最小化最小二乘（也称为套索）以搜索最能解释数据的

（s_ {0} 参数的子集，同时将其他参数设置为零。在这种方法中，已经进行了大量工作来表征估计和模型选择问题。在本文中，我们考虑的是统计意义上的根本性但尚不为人所知的问题。更准确地说，我们解决了为单个回归系数计算p值的问题。一方面，我们在给定的显着性水平下，制定了检验的最小最大功效的一般上限。我们显示，除非在特殊情况下，否则对早期方法的严格保证不允许实现此限制。另一方面，我们证明了在具有独立条目的随机设计矩阵的情况下，可以通过一个实际过程来达到这个上限。我们的方法基于对套索估计器的去偏。该分析建立在对套索估计量及其无偏差版本的渐近分布进行严格刻画的基础上。我们的结果适用于最佳样本量，即，当

（n）至少为

（s_ {0}日志（p / s_ {0}））。我们将我们的方法推广到具有独立相同分布的高斯行

（bo-dsymbol {x} _ {i} sim {sf N}（0，boldsymbol { Sigma}））。在这种情况下，我们证明

（n）具有相似的分布特征（称为标准分布极限）大于

（s_ {0}（log p）^ {2}）。我们的分析假设

（boldsymbol {Sigma}）是已知的。为了应对未知的

（粗体符号{Sigma}），我们建议使用稀疏协方差

（boldsymbol {Sigma}）并通过数值模拟验证该方法。最后，我们表明，对于最佳样本量，

（n）至少为

（s_ {0} log（p / s_ {0}）），一般高斯设计的标准分布极限可以从统计物理学中的复制启发法。这种推导表明，比我们证明的结果更容易猜想，而且对于大量高斯设计，统计功效几乎是最优的。

著录项

来源
《Information Theory, IEEE Transactions on》 |2014年第10期|6522-6554|共33页
作者
Javanmard A.; Montanari A.;
展开▼
作者单位

Department of Electrical Engineering, Stanford University, Stanford, CA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Covariance matrices; Estimation; Linear regression; Noise; Standards; Testing; Upper bound; High-dimensional regression; Lasso; hypothesis testing; p-value; uncertainty assessment;

机译：协方差矩阵;估计;线性回归;噪声;标准;测试;上限;高维回归套索;假设检验;p值不确定性评估;

相似文献

外文文献
中文文献
专利

1. A Note on Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models [J] . Jane Paik Kim Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2013,第1期

机译：关于使用回归模型分析随机试验的注释：尽管模型指定不正确，但渐近有效的假设检验
2. Using regression models to analyze randomized trials: asymptotically valid hypothesis tests despite incorrectly specified models. [J] . Rosenblum M, van der Laan MJ Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2009,第3期

机译：使用回归模型分析随机试验：尽管模型指定不正确，但渐近有效的假设检验。
3. Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models [J] . Ma Rong, Cai T. Tony, Li Hongzhe Journal of the American statistical association . 2021,第534期

机译：高维逻辑回归模型的全局和同步假设检测
4. Asymptotically optimal truncated hypothesis test for a large sensor network described by a multivariate Gaussian distribution [C] . Zhang Jiangfan, Blum Rick S. Asilomar Conference on Signals, Systems Computers . 2013

机译：多元高斯分布描述的大型传感器网络的渐近最优截断假设检验
5. An investigation of testlet-based item response models with a random facets design in generalizability theory. [D] . Chien, Yueh-Mei. 2008

机译：基于概化理论的随机面设计基于睾丸的项目响应模型的研究。
6. Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models [O] . Michael Rosenblum, Mark J. van der Laan -1

机译：使用回归模型分析随机试验：尽管模型指定不正确但渐近有效假设检验
7. Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory [O] . Javanmard, Adel, Montanari, Andrea 2014

机译：高斯分布下高维回归的假设检验随机设计模型：渐近理论
8. Conditionally and Strictly Distribution-Free Tests for Randomized Block Designs That Are Asymptotically Optimal [R] . Tardif, S. 1988

机译：对渐近最优的随机区组设计进行有条理且严格无分布的测试

Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory

摘要

著录项

相似文献

相关主题

期刊订阅