首页> 外文期刊>Genetics: A Periodical Record of Investigations Bearing on Heredity and Variation >Efficient Implementation of Penalized Regression for Genetic Risk Prediction
【24h】

Efficient Implementation of Penalized Regression for Genetic Risk Prediction

机译:有效地实施遗传风险预测的惩罚回归

获取原文
获取原文并翻译 | 示例
           

摘要

Polygenic Risk Scores (PRS) combine genotype information across many single-nucleotide polymorphisms (SNPs) to give a score reflecting the genetic risk of developing a disease. PRS might have a major impact on public health, possibly allowing for screening campaigns to identify high-genetic risk individuals for a given disease. The "Clumping+Thresholding" (C+T) approach is the most common method to derive PRS. C+T uses only univariate genome-wide association studies (GWAS) summary statistics, which makes it fast and easy to use. However, previous work showed that jointly estimating SNP effects for computing PRS has the potential to significantly improve the predictive performance of PRS as compared to C+T. In this paper, we present an efficient method for the joint estimation of SNP effects using individual-level data, allowing for practical application of penalized logistic regression (PLR) on modern datasets including hundreds of thousands of individuals. Moreover, our implementation of PLR directly includes automatic choices for hyper-parameters. We also provide an implementation of penalized linear regression for quantitative traits. We compare the performance of PLR, C+T and a derivation of random forests using both real and simulated data. Overall, we find that PLR achieves equal or higher predictive performance than C+T in most scenarios considered, while being scalable to biobank data. In particular, we find that improvement in predictive performance is more pronounced when there are few effects located in nearby genomic regions with correlated SNPs; for instance, in simulations, AUC values increase from 83% with the best prediction of C+T to 92.5% with PLR. We confirm these results in a data analysis of a case-control study for celiac disease where PLR and the standard C+T method achieve AUC values of 89% and of 82.5%. Applying penalized linear regression to 350,000 individuals of the UK Biobank, we predict height with a larger correlation than with the bes
机译:多基因风险评分(PRS)将基因型信息与许多单核苷酸多态性(SNP)组合,以提供反映发展疾病的遗传风险的分数。 PRS可能对公共卫生产生重大影响,可能允许筛选活动以识别给定疾病的高遗传危险个体。 “Clumping +阈值”(C + T)方法是衍生PRS的最常见方法。 C + T仅使用单变量基因组关联研究(GWAS)概要统计数据,这使得它可以快速且易于使用。然而,以前的工作表明,与C + T相比,共同估计用于计算PRS的SNP效果有可能显着提高PRS的预测性能。在本文中,我们提出了一种利用个性级数据联合估计SNP效应的有效方法,允许在现代数据集上进行惩罚物流回归(PLR)的实际应用,包括数十万人。此外,我们的PLR的实现直接包括超参数的自动选择。我们还提供了对定量性状的惩罚线性回归的实施。我们使用真实和模拟数据比较PLR,C + T和随机林的衍生的性能。总的来说,我们发现PLR在大多数情况下考虑的C + T实现了比C + T等于或更高的预测性能,同时可扩展到BioBank数据。特别是,当附近的基因组区域有很多带有相关的SNP时,我们发现预测性能的提高更加明显;例如,在仿真中,AUC值从83%增加,最佳预测C + T与PLR的92.5%。我们确认这些导致对裂缝疾病的病例对照研究的数据分析,其中PLR和标准C + T方法实现AUC值89%和82.5%。将惩罚线性回归适用于英国Biobank的350,000人,我们预测比与BES更大的相关性

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号