首页> 外文OA文献 >Parallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatistics
【2h】

Parallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatistics

机译:并行测试和变量选择-混合模型方法及其在生物统计学中的应用

摘要

We develop efficient and powerful statistical methods for high-dimensional data, where the sample size is much smaller than the number of features (the so-called 'large p, small n' problem). We deal with three important problems. First, we develop a mixture-model approach for parallel testing for unequal variances in two-sample experiments. The treatment effect on the variance has received little attention in the statistical literature, which so far focused mostly on the effect on the mean. The effect on the variance is increasingly recognized in recent biological literature, and we develop an empirical Bayes approach for testing differences in variance when the number of tests is large. We show that the model is useful in a wide range of applications, that our method is much more powerful than traditional tests for unequal variances, and that it is robust to the normality assumption. Second, we extend these ideas and develop a novel bivariate normal model that tests for both differential expression and differential variation between the two groups. We show in simulations that this new method yields a substantial gain in power when differential variation is present. Through a three-step estimation approach, in which we apply the Laplace approximation and the EM algorithm, we get a computationally efficient method, which is particularly well-suited for 'large p, small n' situations. Third, we deal with the problem of variable selection where the number of putative variables is large, possibly much larger than the sample size. We develop a model-based, empirical Bayes approach. By treating the putative variables as random effects, we get shrinkage estimation, which results in increased power and significantly faster convergence, compared with simulation-based methods. Furthermore, we employ computational tricks which allow us to increase the speed of our algorithm, to handle a very large number of putative variables, and to control the multicollinearity in the model. The motivation for developing this approach is QTL analysis, but our method is applicable to a broad range of applications. We use two widely-studied data sets, and show that our model selection algorithm yields excellent results.
机译:我们为高维数据开发了有效而强大的统计方法,其中样本量远小于特征数量(所谓的“大p,小n”问题)。我们处理三个重要问题。首先,我们开发了一种混合模型方法来并行测试两个样本实验中的不等方差。对方差的处理效果在统计文献中很少受到关注,到目前为止,统计文献主要集中在对均值的影响上。在最近的生物学文献中,人们越来越认识到对方差的影响,并且我们开发了一种经验贝叶斯方法,用于在检验数量大时检验方差差异。我们证明了该模型在广泛的应用中很有用,对于不均等的方差,我们的方法比传统测试强大得多,并且对正态性假设具有鲁棒性。其次,我们扩展了这些想法,并开发了一种新颖的双变量正态模型,用于测试两组之间的差异表达和差异变异。我们在仿真中表明,当存在差分变化时,这种新方法会产生很大的功率增益。通过三步估计方法,在其中应用了Laplace逼近和EM算法,我们得到了一种计算有效的方法,该方法特别适合于“大p,小n”情况。第三,我们处理变量选择的问题,其中假定变量的数量很大,可能比样本量大得多。我们开发了一种基于模型的经验贝叶斯方法。通过将推定变量视为随机效应,我们得到了收缩估计,与基于仿真的方法相比,收缩率得到提高,并且收敛速度显着提高。此外,我们采用了计算技巧,可以提高算法的速度,处理大量假定变量并控制模型中的多重共线性。开发这种方法的动机是QTL分析,但是我们的方法适用于广泛的应用程序。我们使用了两个经过广泛研究的数据集,并表明我们的模型选择算法产生了出色的结果。

著录项

  • 作者

    Bar Haim;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 en_US
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号