首页> 外文期刊>Journal of biomedical informatics. >Data mining methods for classification of Medium-Chain Acyl-CoA dehydrogenase deficiency (MCADD) using non-derivatized tandem MS neonatal screening data.
【24h】

Data mining methods for classification of Medium-Chain Acyl-CoA dehydrogenase deficiency (MCADD) using non-derivatized tandem MS neonatal screening data.

机译:使用非衍生串联MS新生儿筛查数据对中链酰基辅酶A脱氢酶缺乏症(MCADD)进行分类的数据挖掘方法。

获取原文
获取原文并翻译 | 示例

摘要

Newborn screening programs for severe metabolic disorders using tandem mass spectrometry are widely used. Medium-Chain Acyl-CoA dehydrogenase deficiency (MCADD) is the most prevalent mitochondrial fatty acid oxidation defect (1:15,000 newborns) and it has been proven that early detection of this metabolic disease decreases mortality and improves the outcome. In previous studies, data mining methods on derivatized tandem MS datasets have shown high classification accuracies. However, no machine learning methods currently have been applied to datasets based on non-derivatized screening methods. A dataset with 44,159 blood samples was collected using a non-derivatized screening method as part of a systematic newborn screening by the PCMA screening center (Belgium). Twelve MCADD cases were present in this partially MCADD-enriched dataset. We extended three data mining methods, namely C4.5 decision trees, logistic regression and ridge logistic regression, with a parameter and threshold optimization method and evaluated their applicability as a diagnostic support tool. Within a stratified cross-validation setting, a grid search was performed for each model for a wide range of model parameters, included variables and classification thresholds. The best performing model used ridge logistic regression and achieved a sensitivity of 100%, a specificity of 99.987% and a positive predictive value of 32% (recalibrated for a real population), obtained in a stratified cross-validation setting. These results were further validated on an independent test set. Using a method that combines ridge logistic regression with variable selection and threshold optimization, a significantly improved performance was achieved compared to the current state-of-the-art for derivatized data, while retaining more interpretability and requiring less variables. The results indicate the potential value of data mining methods as a diagnostic support tool.
机译:使用串联质谱的严重代谢异常的新生儿筛查程序已被广泛使用。中链酰基辅酶A脱氢酶缺乏症(MCADD)是最普遍的线粒体脂肪酸氧化缺陷(新生儿1:15,000),并且已证明尽早发现这种代谢疾病可以降低死亡率并改善结局。在以前的研究中,对衍生串联MS数据集的数据挖掘方法显示出很高的分类精度。但是,目前尚未将机器学习方法应用于基于非衍生化筛选方法的数据集。使用非衍生化筛查方法收集了包含44159份血液样本的数据集,作为PCMA筛查中心(比利时)进行系统新生儿筛查的一部分。在部分MCADD丰富的数据集中存在12个MCADD病例。我们使用参数和阈值优化方法扩展了C4.5决策树,逻辑回归和岭逻辑回归的三种数据挖掘方法,并评估了它们作为诊断支持工具的适用性。在分层的交叉验证设置中,针对每个模型执行了网格搜索,以搜索各种模型参数,包括变量和分类阈值。表现最佳的模型采用岭对数回归,并在分层交叉验证设置中获得了100%的敏感性,99.987%的特异性和32%的阳性预测值(针对实际人群重新校准)。这些结果在独立的测试仪上得到了进一步验证。使用将脊线逻辑回归与变量选择和阈值优化相结合的方法,与当前最新的衍生数据相比,可以显着提高性能,同时保留更多的可解释性和更少的变量。结果表明数据挖掘方法作为诊断支持工具的潜在价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号