首页> 外文期刊>Molecular biology and evolution >Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions
【24h】

Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions

机译:基于病毒核苷酸组合物预测人自适应流感病毒的机器学习方法

获取原文
获取原文并翻译 | 示例
           

摘要

Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptive IAV nucleotide composition. A total of 217,549 IAV full-length coding sequences of the PB2 (polymerase basic protein-2), PB1, PA (polymerase acidic protein), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). A total of 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13, 10 and 9 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic curve indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptationassociated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic data sets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza.
机译:每种流感大流行病至少部分地由禽类和/或猪来源甲型病毒(IAV)引起。下一个大流行中涉及的潜在IAV的时间目前是不可预测的。我们的目标是构建机器学习(ML)模型以预测人适应性IAV核苷酸组成。 PB2(聚合酶碱性-2),PB1,PA(聚合酶酸性蛋白质),HA(血凝素),NP(核蛋白)和NA(神经氨酸酶)区段的总共217,549天病全长编码序列被分解为它们密码子位于基于密码子的单核苷酸(12nts)和二核苷酸(48dnts)。重采样总共68,722个人序列和68,739禽序列(1:1),以表征具有主成分分析(PCA)和其他ML模型的人适应相关(D)NTS。然后,基于所表征(D)NTS预测IAV序列的人体适应。分别针对六个区段优化了9,12,11,13,10和9人适应性(D)NTS。 PCA和分层聚类分析显示了人类自适应和Avian-Adaptive集之间的优化(D)NTS的线性可分离性。混淆矩阵和接收器操作特性曲线下的区域的结果表明了ML模型的高性能,以预测IAV的人类适应。我们的模型在预测2009 H1N1大流行前后的猪/禽IAVs的人类适应方面表现良好。总之,我们鉴定了IAV段的人适应性分配基因组组合物。使用大IAV基因组数据集的IAV人体适应预测ML模型可以促进影响病毒传播/致病性的关键病毒因子。最重要的是,它允许预测大流行性流感。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号