首页> 美国卫生研究院文献>Scientific Reports >Building a genetic risk model for bipolar disorder from genome-wide association data with random forest algorithm
【2h】

Building a genetic risk model for bipolar disorder from genome-wide association data with random forest algorithm

机译:利用随机森林算法从全基因组关联数据建立躁郁症的遗传风险模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A genetic risk score could be beneficial in assisting clinical diagnosis for complex diseases with high heritability. With large-scale genome-wide association (GWA) data, the current study constructed a genetic risk model with a machine learning approach for bipolar disorder (BPD). The GWA dataset of BPD from the Genetic Association Information Network was used as the training data for model construction, and the Systematic Treatment Enhancement Program (STEP) GWA data were used as the validation dataset. A random forest algorithm was applied for pre-filtered markers, and variable importance indices were assessed. 289 candidate markers were selected by random forest procedures with good discriminability; the area under the receiver operating characteristic curve was 0.944 (0.935–0.953) in the training set and 0.702 (0.681–0.723) in the STEP dataset. Using a score with the cutoff of 184, the sensitivity and specificity for BPD was 0.777 and 0.854, respectively. Pathway analyses revealed important biological pathways for identified genes. In conclusion, the present study identified informative genetic markers to differentiate BPD from healthy controls with acceptable discriminability in the validation dataset. In the future, diagnosis classification can be further improved by assessing more comprehensive clinical risk factors and jointly analysing them with genetic data in large samples.
机译:遗传风险评分可能有助于对具有高遗传力的复杂疾病进行临床诊断。利用大规模的全基因组关联(GWA)数据,当前的研究使用机器学习方法为双相情感障碍(BPD)构建了遗传风险模型。来自遗传协会信息网络的BPD的GWA数据集用作模型构建的训练数据,而系统治疗增强计划(STEP)的GWA数据用作验证数据集。将随机森林算法应用于预过滤标记,并评估变量重要性指数。通过随机森林程序选择了289个候选标记,具有良好的可分辨性。在训练集中,接收器工作特性曲线下的面积为0.944(0.935–0.953),在STEP数据集中,接收器的工作特性曲线下的面积为0.702(0.681–0.723)。使用截止值为184的分数,对BPD的敏感性和特异性分别为0.777和0.854。途径分析揭示了已鉴定基因的重要生物学途径。总之,本研究确定了有用的遗传标记,以将BPD与健康对照区分开,并在验证数据集中具有可接受的可辨别性。将来,可以通过评估更全面的临床危险因素并与大样本中的遗传数据共同分析来进一步改善诊断分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号