...
首页> 外文期刊>BMC Genomics >Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains
【24h】

Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains

机译:基因组重复,错误组装和重新注释:以牙龈卟啉单胞菌参考菌株的长期重测序为例

获取原文
           

摘要

Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation. We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains. In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.
机译:如果不了解它们的基因组序列,就不可能建立组成人类和动物微生物群的细菌的功能模型。不幸的是,绝大多数可公开获得的基因组仅仅是工作草案,这种不完整会导致许多问题,并成为基因型和表型解释的主要障碍。在这项工作中,我们从细菌纲门(Bacteroidetes)中的细菌纲(Bacteroidia)类中的一个例子开始,该例子在人类食性消化菌群中占优势。我们成功地确定了导致装配断裂和装配不当的遗传基因座,并证明了长时间阅读测序和策划的重新注释的重要性和实用性。我们显示,由大规模平行测序组装的拟杆菌拟稿基因组中的片段与与读段大小相同或更大的基因组重复线性相关。我们还证明了其中一些重复序列,特别是较长的重复序列,对应于三个参考齿龈卟啉单胞菌基因组中标记为环化(因此已完成或完成)的错配基因座。我们证明,即使覆盖率不高(30X),也可以使用长时间重测序和PCR连续性验证(rrn操纵子以及整合共轭元件或ICE)来识别和纠正错误组合或组装的区域。最后,尽管耗时且费力,但对三种牙龈卟啉单胞菌菌株的一致的手动生物固化使我们能够比较和校正现有的基因组注释,从而更准确地解释了这些菌株之间的基因组差异。在这项研究中,我们证明了长时间阅读测序在验证已发表的基因组(即使已完成)以及为具有高基因组可塑性的新细菌菌株/物种产生装配中的实用性和重要性。我们还表明,当与生物学验证过程和勤奋的生物固化注释结合使用时,该策略有助于减少共享数据库中错误的传播,从而限制基于不完整或误导性信息的错误结论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号