...
首页> 外文期刊>MBio >A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems
【24h】

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

机译:有效应用机器学习对基于微生物组的分类问题的框架

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) ( n ?=?490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12?min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability.
机译:人微生物组的机器学习(ML)建模具有识别微生物生物标志物的潜力,并有助于诊断许多疾病,如炎症性肠病,糖尿病和结直肠癌。已经向开发使用细菌丰富预测健康结果的ML模型进行了进展,但采用培训和评估方法的采用不一致地称为这些模型的有效性。此外,许多研究人员似乎似乎有利于通过可解释性提高模型复杂性。为了克服这些挑战,我们训练了七种模型,它使用粪便16s rRNA序列数据来预测结肠筛网的存在相关肿瘤(Srns)(n?= 490名患者,261例和229例)。我们开发了一个可重复使用的开源管道,可以培训,验证和解释ML模型。为了表明模型选择的效果,我们评估了L2-正则化物流回归,L1-and L2 - 正则化支持向量机(SVM)的预测性能,可解释性和培训时间,具有线性和径向基函数内核,决策树,随机森林和渐变增强树(XGBoost)。随机森林模型在检测SRNS中,在检测到0.695的接收器操作特性曲线(AUROC)下的区域的SRNS(第0.651次,0.651至0.739),但训练(83.2小时)慢,并且本身并不自然地解释。尽管其简单性,L2 - 正则化逻辑回归遵循随机森林,具有0.680(IQR,0.625至0.735)的Auroc,培训更快(12?分钟),并且本身是本身的解释。我们的分析强调了根据该研究的目标选择ML方法的重要性,因为选择会通知对绩效和可解释性的预期。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号