...
首页> 外文期刊>BMC Genomics >A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes
【24h】

A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes

机译:一个逐个基因的人口基因组学平台:从头开始装配,注释和108个代表性脑膜炎奈瑟氏球菌基因组的族谱分析

获取原文
           

摘要

Highly parallel, ‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary. The performance of de novo short-read assembly followed by automatic annotation using the pubMLST.org Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database. The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.
机译:高度平行的“第二代”测序技术迅速扩展了可用于研究的细菌全基因组序列的数量,从而使种群基因组学的学科得以兴起。这些数据大多数以未汇编的短读序列文件形式公开提供,需要先进行大量处理,然后才能用于分析。因此,有必要以统一的格式提供数据,可以轻松地评估其质量,与来源和表型相关联并用于分析。使用pubMLST.org Neisseria数据库进行了从头短读装配后自动注释的性能,并对108种不同的,代表性的和特征明确的脑膜炎奈瑟氏菌分离株进行了评估。在重新组装的基因组和四个重新测序的基因组中,> 99%的已知脑膜炎球菌基因获得了高质量的序列,而少于1%的重新组装的基因具有序列差异或错误组装的序列。使用Genome Comparator工具确定了至少95%的人口中存在的1600个基因座的核心基因组。通过核心基因组比较和核糖体蛋白基因分析获得了与多基因座序列分型所鉴定的亲缘关系,但具有更高的分辨率,从而揭示了许多先前描述的表型的基因组结构。实施了用于对基因组中奈瑟氏菌遗传变异进行分类的统一系统,并将其用于多种分析,并且该数据可在PubMLST Neisseria数据库中公开获得。从头组装,再加上自动的逐个基因注释,可生成高质量的草图基因组,其中大多数蛋白质编码基因都以高精度存在。该方法有效地分类了多样性,允许对单个基因组或多个基因组进行比较分析,并且是解释大型细菌种群样品的WGS数据的实用方法。该方法对脑膜炎双球菌的生物学产生了新颖的见解,并增进了我们对整个人口结构的理解,而不仅仅是引起疾病的世系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号