首页> 外文期刊>BMC proceedings. >Using penalized regression to predict phenotype from SNP data
【24h】

Using penalized regression to predict phenotype from SNP data

机译:使用惩罚回归从SNP数据预测表型

获取原文
           

摘要

BackgroundIn a typical genome-enabled prediction problem there are many more predictor variables than response variables. This prohibits the application of multiple linear regression, because the unique ordinary least squares estimators of the regression coefficients are not defined. To overcome this problem, penalized regression methods have been proposed, aiming at shrinking the coefficients toward zero. MethodsWe explore prediction of phenotype from single nucleotide polymorphism (SNP) data in the GAW20 data set using a penalized regression approach (LASSO [least absolute shrinkage and selection operator] regression). We use 10-fold cross-validation to assess predictive performance and 10-fold nested cross-validation to specify a penalty parameter. ResultsBy analyzing approximately 600,000 SNPs we find that, when the sample size comprises a few hundred individuals, SNP effects are heavily penalized, resulting in a poor predictive performance. Increasing the sample size to a few thousand individuals results in a much smaller penalization of the true effects, thus greatly improving the prediction. ConclusionsLASSO regression results in a heavy shrinkage of the regression coefficients, and also requires large sample sizes (several thousand individuals) to achieve good prediction.
机译:背景技术在典型的启用基因组的预测问题中,预测变量比响应变量多得多。因为没有定义回归系数的唯一普通最小二乘估计,所以这禁止了多元线性回归的应用。为了克服这个问题,提出了一种惩罚性回归方法,旨在将系数缩小到零。方法我们采用惩罚回归方法(LASSO [最小绝对收缩和选择算子]回归),根据GAW20数据集中的单核苷酸多态性(SNP)数据探索表型预测。我们使用10倍交叉验证来评估预测性能,并使用10倍嵌套交叉验证来指定惩罚参数。结果通过分析大约600,000个SNP,我们发现,当样本数量包括数百个个体时,SNP的影响会受到严重影响,从而导致较差的预测性能。将样本数量增加到几千个个体,对真实效果的惩罚要小得多,从而大大改善了预测。结论LASSO回归导致回归系数大幅缩水,并且还需要大样本量(数千个人)才能实​​现良好的预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号