首页> 美国卫生研究院文献>Microbial Genomics >Comprehensive assessment of the quality of Salmonella whole genome sequence data available in public sequence databases using the Salmonella in silico Typing Resource (SISTR)
【2h】

Comprehensive assessment of the quality of Salmonella whole genome sequence data available in public sequence databases using the Salmonella in silico Typing Resource (SISTR)

机译:使用沙门氏菌计算机分型资源(SISTR)可在公共序列数据库中全面评估沙门氏菌全基因组序列数据的质量

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Public health and food safety institutions around the world are adopting whole genome sequencing (WGS) to replace conventional methods for characterizing Salmonella for use in surveillance and outbreak response. Falling costs and increased throughput of WGS have resulted in an explosion of data, but questions remain as to the reliability and robustness of the data. Due to the critical importance of serovar information to public health, it is essential to have reliable serovar assignments available for all of the Salmonella records. The current study used a systematic assessment and curation of all Salmonella in the sequence read archive (SRA) to assess the state of the data and their utility. A total of 67 758 genomes were assembled de novo and quality-assessed for their assembly metrics as well as species and serovar assignments. A total of 42 400 genomes passed all of the quality criteria but 30.16 % of genomes were deposited without serotype information. These data were used to compare the concordance of reported and predicted serovars for two in silico prediction tools, multi-locus sequence typing (MLST) and the Salmonella in silico Typing Resource (SISTR), which produced predictions that were fully concordant with 87.51 and 91.91 % of the tested isolates, respectively. Concordance of in silico predictions increased when serovar variants were grouped together, 89.25 % for MLST and 94.98 % for SISTR. This study represents the first large-scale validation of serovar information in public genomes and provides a large validated set of genomes, which can be used to benchmark new bioinformatics tools.
机译:世界各地的公共卫生和食品安全机构都在采用全基因组测序(WGS)来代替用于表征沙门氏菌的常规方法,以用于监测和暴发反应。 WGS的成本下降和吞吐量增加导致数据爆炸,但是对于数据的可靠性和健壮性仍然存在疑问。由于血清素信息对公共卫生至关重要,因此必须为所有沙门氏菌记录提供可靠的血清素分配。当前的研究对序列读取档案库(SRA)中的所有沙门氏菌进行了系统的评估和管理,以评估数据的状态及其效用。从头开始组装总共67 758个基因组,并对其组装指标以及物种和血清型分配进行质量评估。共有42 400个基因组通过了所有质量标准,但是30.16 %%的基因组没有血清型信息。这些数据用于比较两种计算机模拟预测工具(多位点序列分型(MLST)和沙门氏菌计算机分型资源(SISTR))的报告和预测的血清型的一致性,这两种预测产生的预测与87.51和91.91完全一致分别占测试菌株的%。当将血清型变异体分组在一起时,计算机预测的一致性增加,MLST为89.25%,SISTR为94.98%。这项研究代表了对公共基因组中血清信息的首次大规模验证,并提供了一套经过验证的大型基因组,可用于对新的生物信息学工具进行基准测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号