...
首页> 外文期刊>BioData Mining >A maximum flow-based network approach for identification of stable noncoding biomarkers associated with the multigenic neurological condition, autism
【24h】

A maximum flow-based network approach for identification of stable noncoding biomarkers associated with the multigenic neurological condition, autism

机译:一种基于基于流的网络方法,用于鉴定与多染性神经状态,自闭症相关的稳定非分量生物标志物

获取原文

摘要

Machine learning approaches for predicting disease risk from high-dimensional whole genome sequence (WGS) data often result in unstable models that can be difficult to interpret, limiting the identification of putative sets of biomarkers. Here, we design and validate a graph-based methodology based on maximum flow, which leverages the presence of linkage disequilibrium (LD) to identify stable sets of variants associated with complex multigenic disorders. We apply our method to a previously published logistic regression model trained to identify variants in simple repeat sequences associated with autism spectrum disorder (ASD); this L1-regularized model exhibits high predictive accuracy yet demonstrates great variability in the features selected from over 230,000 possible variants. In order to improve model stability, we extract the variants assigned non-zero weights in each of 5 cross-validation folds and then assemble the five sets of features into a flow network subject to LD constraints. The maximum flow formulation allowed us to identify 55 variants, which we show to be more stable than the features identified by the original classifier. Our method allows for the creation of machine learning models that can identify predictive variants. Our results help pave the way towards biomarker-based diagnosis methods for complex genetic disorders.
机译:用于预测从高维整体基因组序列(WGS)数据的疾病风险的机器学习方法经常导致不稳定的模型,可能难以解释,限制推定的生物标志物组的识别。在此,我们基于最大流程设计和验证基于图的方法,该方法利用链接不平衡(LD)的存在,鉴定与复杂多群疾病相关的稳定的变体组。我们将我们的方法应用于先前发布的逻辑回归模型,以训练,以识别与自闭症谱系(ASD)相关的简单重复序列中的变体;该L1 - 正则化模型表现出高的预测精度,但展示了从超过230,000种可能的变体中选择的功能的巨大变化。为了提高模型稳定性,我们提取在5个交叉验证折叠中的每一个中分配的非零权重的变体,然后将五组特征组装到经过LD约束的流网络中。最大流制定允许我们识别55个变体,我们显示比原始分类器所识别的功能更稳定。我们的方法允许创建可以识别预测变体的机器学习模型。我们的结果有助于为复杂的遗传疾病提供基于生物标志物的诊断方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号