首页> 美国卫生研究院文献>American Journal of Human Genetics >Making the Most of Clumping and Thresholding for Polygenic Scores
【2h】

Making the Most of Clumping and Thresholding for Polygenic Scores

机译:充分利用聚集和阈值进行多基因评分

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544–0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580–0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.
机译:多基因预测有可能为精密医学做出贡献。聚类和阈值化(C + T)是一种广泛用于获得多基因得分的方法。使用C + T时,将测试多个p值阈值,以最大程度地提高派生多基因得分的预测能力。连同此p值阈值,我们建议调整C + T的其他三个超参数。我们实现了一种有效的方法,可以得出与四个超参数上的网格相对应的数千个不同的C + T分数。例如,使用16个物理核心需要300,000个个体和1M变体来导出123K个不同的C + T分数需要花费几个小时。我们发现,与仅调整p值阈值相比,在这四个超参数上进行优化可以提高C + T在模拟和实际数据应用中的预测性能。当预测抑郁状态时,可以注意到特别大的增加,从仅将p值阈值调整到AUC为0.592(95%CI:[0.580–0.604]时的AUC为0.557(95%CI:[0.544–0.569])。 )在调整所有四个超参数时,我们建议使用C + T。我们进一步提出了堆积聚类和阈值化(SCT),这是一种通过堆叠所有导出的C + T分数而得到的多基因分数。 SCT通过使用有效的罚回归来学习所有C + T分数的最佳线性组合,而不是选择在某些训练集中最大化预测的一组超参数。我们将SCT应用于英国生物库数据中的八种不同的病例对照疾病,发现SCT大大提高了预测准确性,平均AUC较标准C + T增加了0.035。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号