首页> 外文期刊>The American Journal of Human Genetics >Making the Most of Clumping and Thresholding for Polygenic Scores
【24h】

Making the Most of Clumping and Thresholding for Polygenic Scores

机译:使多种簇和阈值合作的多基因分数

获取原文
获取原文并翻译 | 示例
           

摘要

Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyperparameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.
机译:多基因预测有可能为精确医学做出贡献。聚类和阈值(C+T)是一种广泛使用的获得多基因评分的方法。当使用C+T时,测试几个p值阈值,以最大限度地提高衍生多基因评分的预测能力。除了这个p值阈值之外,我们还建议调整其他三个C+T超参数。我们实现了一种有效的方法,可以在四个超参数上获得对应于网格的数千个不同C+T分数。例如,使用16个物理核心,为30万个个体和100万个变体推导出123K个不同的C+T分数需要几个小时。我们发现,与只调整p值阈值相比,在模拟和实际数据应用中,对这四个超参数进行优化可以提高C+T的预测性能。在预测抑郁状态时,可以注意到一个特别大的增加,当调整我们为C+T提出的所有四个超参数时,仅将p值阈值调整到AUC为0.592(95%CI:[0.580-0.604])时,AUC为0.557(95%CI:[0.544-0.569])。我们进一步提出了叠加聚集和阈值(SCT),这是一种多基因评分,由所有衍生的C+T评分叠加而成。SCT通过使用有效的惩罚回归学习所有C+T分数的最佳线性组合,而不是在某个训练集中选择一组最大化预测的超参数。我们将SCT应用于英国生物库数据中的八种不同病例对照疾病,发现SCT显著提高了预测准确性,平均AUC比标准C+T增加了0.035。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号