首页> 外文期刊>Risk analysis >Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next-Generation Sequencing Data
【24h】

Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next-Generation Sequencing Data

机译:机器学习方法作为使用下一代测序数据预测疾病风险的工具

获取原文
获取原文并翻译 | 示例
       

摘要

Next-generation sequencing (NGS) data present an untapped potential to improve microbial risk assessment (MRA) through increased specificity and redefinition of the hazard. Most of the MRA models do not account for differences in survivability and virulence among strains. The potential of machine learning algorithms for predicting the risk/health burden at the population level while inputting large and complex NGS data was explored with Listeria monocytogenes as a case study. Listeria data consisted of a percentage similarity matrix from genome assemblies of 38 and 207 strains of clinical and food origin, respectively. Basic Local Alignment (BLAST) was used to align the assemblies against a database of 136 virulence and stress resistance genes. The outcome variable was frequency of illness, which is the percentage of reported cases associated with each strain. These frequency data were discretized into seven ordinal outcome categories and used for supervised machine learning and model selection from five ensemble algorithms. There was no significant difference in accuracy between the models, and support vector machine with linear kernel was chosen for further inference (accuracy of 89% [95% CI: 68%, 97%]). The virulence genes FAM002725, FAM002728, FAM002729, InlF, InlJ, Inlk, IisY, IisD, IisX, IisH, IisB, lmo2026, and FAM003296 were important predictors of higher frequency of illness. InlF was uniquely truncated in the sequence type 121 strains. Most important risk predictor genes occurred at highest prevalence among strains from ready-to-eat, dairy, and composite foods. We foresee that the findings and approaches described offer the potential for rethinking the current approaches in MRA.
机译:下一代测序(NGS)数据具有通过提高特异性和重新定义危害来改善微生物风险评估(MRA)的未开发潜力。大多数MRA模型不能解释菌株之间的生存力和毒力差异。以单核细胞增生李斯特菌为例,研究了机器学习算法在输入大量复杂的NGS数据时预测人群水平上的风险/健康负担的潜力。李斯特菌数据由分别来自38个和207个临床和食品来源菌株的基因组组装体的相似度百分比矩阵组成。使用基本局部比对(BLAST)将装配体与136个毒力和抗逆性基因的数据库比对。结果变量是疾病的发生频率,这是与每种菌株相关的报告病例的百分比。这些频率数据被离散为七个顺序结果类别,并用于监督的机器学习和来自五种集成算法的模型选择。模型之间的准确性没有显着差异,并且选择具有线性核的支持向量机进行进一步推断(准确性为89%[95%CI:68%,97%])。毒力基因FAM002725,FAM002728,FAM002729,Inlf,InlJ,Inlk,IisY,IisD,IisX,IisH,IisB,lmo2026和FAM003296是重要的疾病高发预测因子。在序列121型菌株中Inf被独特地截短。即食食品,乳制品和复合食品中,最重要的风险预测基因发生率最高。我们预见,所描述的发现和方法为重新思考MRA中的当前方法提供了潜力。

著录项

  • 来源
    《Risk analysis》 |2019年第6期|1397-1413|共17页
  • 作者单位

    Tech Univ Denmark, Natl Food Inst, Div Epidemiol & Microbial Genom, Kemitorvet,Bldg 204,Room 104, DK-2800 Lyngby, Denmark;

    Univ Paris Est, Agence Natl Secur Sanit Alimentat Environm & Trav, Lab Food Safety, Maisons Alfort, France;

    Tech Univ Denmark, Natl Food Inst, Div Epidemiol & Microbial Genom, Kemitorvet,Bldg 204,Room 104, DK-2800 Lyngby, Denmark;

    Univ Paris Est, Agence Natl Secur Sanit Alimentat Environm & Trav, Lab Food Safety, Maisons Alfort, France;

    Tech Univ Denmark, Natl Food Inst, Div Epidemiol & Microbial Genom, Kemitorvet,Bldg 204,Room 104, DK-2800 Lyngby, Denmark;

    Tech Univ Denmark, Natl Food Inst, Div Epidemiol & Microbial Genom, Kemitorvet,Bldg 204,Room 104, DK-2800 Lyngby, Denmark;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Listeria monocytogenes; machine learning; microbial risk assessment; support vector machines; whole genome sequencing;

    机译:单核细胞增生李斯特菌;机器学习;微生物风险评估;支持向量机;全基因组测序;
  • 入库时间 2022-08-18 04:17:44

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号