首页> 美国卫生研究院文献>Wiley-Blackwell Online Open >Ridge Regression in Prediction Problems: Automatic Choice of the Ridge Parameter
【2h】

Ridge Regression in Prediction Problems: Automatic Choice of the Ridge Parameter

机译:预测问题中的岭回归:岭参数的自动选择

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

To date, numerous genetic variants have been identified as associated with diverse phenotypic traits. However, identified associations generally explain only a small proportion of trait heritability and the predictive power of models incorporating only known-associated variants has been small. Multiple regression is a popular framework in which to consider the joint effect of many genetic variants simultaneously. Ordinary multiple regression is seldom appropriate in the context of genetic data, due to the high dimensionality of the data and the correlation structure among the predictors. There has been a resurgence of interest in the use of penalised regression techniques to circumvent these difficulties. In this paper, we focus on ridge regression, a penalised regression approach that has been shown to offer good performance in multivariate prediction problems. One challenge in the application of ridge regression is the choice of the ridge parameter that controls the amount of shrinkage of the regression coefficients. We present a method to determine the ridge parameter based on the data, with the aim of good performance in high-dimensional prediction problems. We establish a theoretical justification for our approach, and demonstrate its performance on simulated genetic data and on a real data example. Fitting a ridge regression model to hundreds of thousands to millions of genetic variants simultaneously presents computational challenges. We have developed an R package, ridge, which addresses these issues. Ridge implements the automatic choice of ridge parameter presented in this paper, and is freely available from CRAN.
机译:迄今为止,已经鉴定出许多遗传变异与多种表型特征有关。但是,已识别的关联通常只能解释特征遗传力的一小部分,并且仅包含已知关联变体的模型的预测能力很小。多元回归是一个流行的框架,其中可以同时考虑许多遗传变异的联合效应。由于数据的高维数和预测变量之间的相关结构,在基因数据的背景下,普通多元回归很少适用。对使用惩罚回归技术规避这些困难的兴趣重新出现。在本文中,我们关注于岭回归,这是一种惩罚性回归方法,已被证明在多变量预测问题中具有良好的性能。脊回归的应用中的一个挑战是控制回归系数的收缩量的脊参数的选择。我们提出了一种基于数据确定岭参数的方法,目的是在高维预测问题中表现良好。我们为我们的方法建立了理论依据,并在模拟的遗传数据和实际数据示例中证明了其性能。同时将岭回归模型拟合成千上万到数百万个遗传变异会带来计算上的挑战。我们已经开发了R软件包ridge来解决这些问题。 Ridge实现了本文介绍的ridge参数的自动选择,可从CRAN免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号