首页> 外文期刊>Progress in Artificial Intelligence >Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines
【24h】

Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines

机译:基因组多样性影响细菌单核苷酸多态性呼叫管道的准确性

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Background: Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella. Results: We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis. Conclusions: The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.
机译:背景技术:精确地识别来自细菌测序数据的单核苷酸多态性(SNP)是使用基因组学跟踪透射和预测诸如抗微生物抗性的重要表型的基本要求。然而,SNP呼叫的最先前的性能评估已被限制为真核生物(人类)数据。另外,细菌SNP呼叫需要选择合适的参考基因组以将读取与生物信息管道一起对准,这会影响获得的一组SNP呼叫的精度和完整性。本研究评估了使用来自154个临床常见细菌和来自属植物植物,肠杆菌,大肠杆菌和Klebsiella的454个临床常见细菌和来自环保和基因组多种分离物的实际数据的模拟数据的组合来评估209个SNP呼叫管道的性能。结果:我们评估了209个SNP呼叫管道的性能,对齐读取与相同或发散应变的基因组。无论管道如何,可靠的SNP呼叫的主要决定因素是参考基因组选择。在多个分类群中,管道灵敏度和精度之间存在强烈的反比关系,并且读取和参考基因组之间的捣碎距离(平均核苷酸分歧的代理)。对于多元化的重组细菌,例如大肠杆菌,但对结核分枝杆菌等克隆物种的显着优势,效果特别明显。结论:通过增加物种内部多样性来损害给定物种的SNP呼叫的准确性。当读取与它们被测序的相同基因组对齐时,在最高性交的管道中是Novoalign / Gatk。相比之下,当读取与特别分歧的基因组对齐时,最高性能的管道通常使用对准器NextGenmap或Smalt,和/或变体呼叫者LofReq,Mpileup或Strelka。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号