首页> 美国卫生研究院文献>other >ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUSCORRELATIONS AND THEIR APPLICATIONS

【2h】

ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUSCORRELATIONS AND THEIR APPLICATIONS

机译：发现是偶然的吗？最大伪散布的分布相关性及其应用

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries by such data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions on exogeneity of covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given certain number of predictors, namely, the distribution of the correlation of a response variable >Y with the best s linear combinations of p covariates >X, even when >X and >Y are independent. When the covariance matrix of >X possesses the restricted eigenvalue property, we derive such distributions for both finite s and diverging s, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of >X. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such asimple bootstrap approach. The results are further extended to the situationwhere residuals are from regularized fits. Our approach is then applied toconstruct the upper confidence limit for the maximum spurious correlation andtesting exogeneity of covariates. The former provides a baseline for guardingagainst false discoveries due to data mining and the latter tests whether ourfundamental assumptions for high-dimensional model selection are statisticallyvalid. Our techniques and results are illustrated by both numerical examples andreal data analysis.

机译：在过去的二十年中，已经开发了许多令人兴奋的变量选择方法来查找与大池响应相关的一小组协变量。由于高维和有限的样本量，通过这种数据挖掘方法进行的发现是否可能是虚假的？我们对于这种变量选择所需的协变量外生性的基本假设是否可以用数据验证？要回答这些问题，我们需要在给定一定数量的预测变量的情况下，得出最大虚假相关性的分布，即响应变量> Y 与p个协变量的最佳s线性组合的相关性的分布> X ，即使> X 和> Y 是独立的。当> X 的协方差矩阵具有受限的特征值属性时，我们使用高斯逼近和经验过程技术来导出有限s和发散s的此类分布。但是，这种分布取决于> X 的未知协方差矩阵。因此，我们使用乘数自举程序来近似未知分布并建立这样的一致性。简单的引导方法。结果进一步扩展到情况残差来自正则拟合。然后将我们的方法应用于构造最大伪相关的置信上限，并测试协变量的外生性。前者提供了防护的基准防止由于数据挖掘而产生的错误发现，后者会测试我们是否高维模型选择的基本假设在统计上有效。我们的技术和结果通过数值示例和真实数据分析。

著录项

期刊名称 other
作者
Jianqing Fan; Qi-Man Shao; Wen-Xin Zhou;
展开▼
作者单位

展开▼
年(卷),期 -1(46),3
年度 -1
页码 989–1017
总页数 36
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS [J] . Fan Jianqing, Shao Qi-Man, Zhou Wen-Xin The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics . 2018,第3期

机译：发现是虚假的吗？最大杂散相关性及其应用的分布
2. Spurious Latent Class Problem in the Mixed Rasch Model: A Comparison of Three Maximum Likelihood Estimation Methods under Different Ability Distributions [J] . Sedat Sen International Journal of Testing: Official Journal of the International Test Commission . 2018,第1期

机译：混合Rasch模型中的虚假潜在问题：不同能力分布下三种最大似然估计方法的比较
3. Use of the Principles of Maximum Entropy and Maximum Relative Entropy for the Determination of Uncertain Parameter Distributions in Engineering Applications [J] . José-Luis Mu?oz-Cobo, Rafael Mendizábal, Arturo Miquel, Entropy . 2017,第9期

机译：在工程应用中使用最大熵和最大相对熵原理确定不确定的参数分布
4. The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery [C] . Sebastian Dalleiger, Jilles Vreeken IEEE International Conference on Data Mining . 2020

机译：轻松的最大熵分布及其应用于模式发现
5. A Discrete Model to Predict the Particle Size Distribution of Metal Powders to Yield the Maximum Density for Additive Manufacturing Applications [D] . Damptey, Ransford Kenya. 2018

机译：用于预测金属粉末粒度分布的离散模型，从而产生添加剂制造应用的最大密度
6. Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery [O] . Catherine Tuglus, Mark J. van der Laan -1

机译：使用目标最大似然法的重复测量半参数回归及其在转录因子活性发现中的应用
7. Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications [O] . Fan, Jianqing, Shao, Qi-Man, Zhou, Wen-Xin 2017

机译：发现是虚假的吗？最大伪相关的分布及其应用
8. Application of Maximum Entropy Analysis to ISAR Imagery and Spurious Scatterer Location in Anechoic Chambers. [R] . Borden, B. 1989

机译：最大熵分析在消音室内IsaR图像和杂散散射器定位中的应用。

ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUSCORRELATIONS AND THEIR APPLICATIONS

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅