首页> 外文期刊>Bioinformatics >High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans
【24h】

High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans

机译:利用机器学习技术对连续性状进行高维药物遗传学预测,并将其应用到非洲裔美国人的华法林剂量预测中

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: With complex traits and diseases having potential genetic contributions of thousands of genetic factors, and with current genotyping arrays consisting of millions of single nucleotide polymorphisms (SNPs), powerful high-dimensional statistical techniques are needed to comprehensively model the genetic variance. Machine learning techniques have many advantages including lack of parametric assumptions, and high power and flexibility.Results: We have applied three machine learning approaches: Random Forest Regression (RFR), Boosted Regression Tree (BRT) and Support Vector Regression (SVR) to the prediction of warfarin maintenance dose in a cohort of African Americans. We have developed a multi-step approach that selects SNPs, builds prediction models with different subsets of selected SNPs along with known associated genetic and environmental variables and tests the discovered models in a cross-validation framework. Preliminary results indicate that our modeling approach gives much higher accuracy than previous models for warfarin dose prediction. A model size of 200 SNPs (in addition to the known genetic and environmental variables) gives the best accuracy. The R-2 between the predicted and actual square root of warfarin dose in this model was on average 66.4% for RFR, 57.8% for SVR and 56.9% for BRT. Thus RFR had the best accuracy, but all three techniques achieved better performance than the current published R-2 of 43% in a sample of mixed ethnicity, and 27% in an African American sample. In summary, machine learning approaches for high-dimensional pharmacogenetic prediction, and for prediction of clinical continuous traits of interest, hold great promise and warrant further research.
机译:动机:由于复杂的性状和疾病具有成千上万的遗传因素的潜在遗传贡献,并且当前的基因分型阵列由数百万个单核苷酸多态性(SNP)组成,因此需要强大的高维统计技术来全面地模拟遗传变异。机器学习技术具有许多优势,包括缺乏参数假设以及强大的功能和灵活性。结果:我们将3种机器学习方法应用于:随机森林回归(RFR),增强回归树(BRT)和支持向量回归(SVR)。非裔美国人队列中华法林维持剂量的预测。我们已经开发了一种多步骤方法,可以选择SNP,使用所选SNP的不同子集以及已知的相关遗传和环境变量构建预测模型,并在交叉验证框架中测试发现的模型。初步结果表明,我们的建模方法比以前的模型能够更准确地预测华法林剂量。 200个SNP的模型大小(除了已知的遗传和环境变量外)可提供最佳准确性。在该模型中,华法林剂量的预计和实际平方根之间的R-2平均为RFR 66.4%,SVR 57.8%,BRT 56.9%。因此,RFR的准确性最高,但是所有三种技术均比当前公布的R-2更好,在混合种族样本中达到43%,在非裔美国人样本中达到27%。综上所述,机器学习方法可用于高维药物遗传学预测和临床感兴趣的连续性状预测,具有广阔的前景,值得进一步研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号