首页> 外文期刊>Journal of Translational Medicine >Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population
【24h】

Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population

机译:中国人口慢性阻塞性肺病预测机器学习工具的比较与发展

获取原文
           

摘要

Chronic obstructive pulmonary disease (COPD) is a major public health problem and cause of mortality worldwide. However, COPD in the early stage is usually not recognized and diagnosed. It is necessary to establish a risk model to predict COPD development. A total of 441 COPD patients and 192 control subjects were recruited, and 101 single-nucleotide polymorphisms (SNPs) were determined using the MassArray assay. With 5 clinical features as well as SNPs, 6 predictive models were established and evaluated in the training set and test set by the confusion matrix AU-ROC, AU-PRC, sensitivity (recall), specificity, accuracy, F1 score, MCC, PPV (precision) and NPV. The selected features were ranked. Nine SNPs were significantly associated with COPD. Among them, 6 SNPs (rs1007052, OR?=?1.671, P?=?0.010; rs2910164, OR?=?1.416, P??0.037; rs473892, OR?=?1.473, P??0.044; rs161976, OR?=?1.594, P??0.044; rs159497, OR?=?1.445, P??0.045; and rs9296092, OR?=?1.832, P??0.045) were risk factors for COPD, while 3 SNPs (rs8192288, OR?=?0.593, P??0.015; rs20541, OR?=?0.669, P??0.018; and rs12922394, OR?=?0.651, P??0.022) were protective factors for COPD development. In the training set, KNN, LR, SVM, DT and XGboost obtained AU-ROC values above 0.82 and AU-PRC values above 0.92. Among these models, XGboost obtained the highest AU-ROC (0.94), AU-PRC (0.97), accuracy (0.91), precision (0.95), F1 score (0.94), MCC (0.77) and specificity (0.85), while MLP obtained the highest sensitivity (recall) (0.99) and NPV (0.87). In the validation set, KNN, LR and XGboost obtained AU-ROC and AU-PRC values above 0.80 and 0.85, respectively. KNN had the highest precision (0.82), both KNN and LR obtained the same highest accuracy (0.81), and KNN and LR had the same highest F1 score (0.86). Both DT and MLP obtained sensitivity (recall) and NPV values above 0.94 and 0.84, respectively. In the feature importance analyses, we identified that AQCI, age, and BMI had the greatest impact on the predictive abilities of the models, while SNPs, sex and smoking were less important. The KNN, LR and XGboost models showed excellent overall predictive power, and the use of machine learning tools combining both clinical and SNP features was suitable for predicting the risk of COPD development.
机译:慢性阻塞性肺病(COPD)是一个主要的公共卫生问题和全球死亡的原因。但是,通常无法识别和诊断出现早期阶段的COPD。有必要建立一个风险模型来预测COPD开发。募集了共441名COPD患者和192个对照受试者,使用MassArray测定法测定101个单核苷酸多态性(SNP)。具有5个临床特征和SNP,在训练集和测试中建立和评估了6种预测模型,并通过混乱矩阵Au-Roc,Au-PRC,敏感性(召回),特异性,准确性,F1得分,MCC,PPV进行了评估(精确)和NPV。所选功能排名。九个SNP与COPD显着相关。其中,6个SNP(RS1007052,或?=?1.671,P?=?0.010; RS2910164,或?=?1.416,P?<0.037; RS473892,或?=?1.473,P?<0.044; RS161976,或者?=?1.594,p?<?0.044; rs159497,或?1.445,p?<0.045;和rs9296092,或?= 1.832,p?<0.045)是COPD的危险因素,而3个SNP (rs8192288,或?=?0.593,p?<?0.015; rs20541,或?= 0.669,p?<0.018;和rs12922394,或?= 0.651,p?<0.022)是COPD发育的保护因素。在训练集,KNN,LR,SVM,DT和XGBOOST中获得的Au-Roc值高于0.82和高于0.92的AU-PRC值。在这些模型中,XGBoost获得了最高的AU-ROC(0.94),AU-PRC(0.97),精度(0.91),精度(0.95),F1得分(0.94),MCC(0.77)和特异性(0.85),而MLP获得最高敏感性(召回)(0.99)和NPV(0.87)。在验证集中,KNN,LR和XGBoost分别获得了0.80和0.85高于0.80和0.85的AU-ROC和AU-PRC值。 KNN具有最高精度(0.82),KNN和LR均获得相同的最高精度(0.81),KNN和LR具有相同的F1分数(0.86)。 DT和MLP均分别获得高于0.94和0.84以上的灵敏度(召回)和NPV值。在特征重要性分析中,我们确定了AQCI,年龄和BMI对模型的预测能力影响最大,而SNPS,性和吸烟则不太重要。 KNN,LR和XGBoost模型显示出优异的整体预测功率,并且使用机器学习工具结合临床和SNP特征是适用于预测COPD开发的风险。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号