首页> 外文期刊>LIPIcs : Leibniz International Proceedings in Informatics >Faster Pan-Genome Construction for Efficient Differentiation of Naturally Occurring and Engineered Plasmids with Plaster
【24h】

Faster Pan-Genome Construction for Efficient Differentiation of Naturally Occurring and Engineered Plasmids with Plaster

机译:更快的泛基因组构建,可有效区分天然存在和工程改造的带质粒质粒

获取原文
           

摘要

As sequence databases grow, characterizing diversity across extremely large collections of genomes requires the development of efficient methods that avoid costly all-vs-all comparisons [Marschall et al., 2018]. In addition to exponential increases in the amount of natural genomes being sequenced, improved techniques for the creation of human engineered sequences is ushering in a new wave of synthetic genome sequence databases that grow alongside naturally occurring genome databases. In this paper, we analyze the full diversity of available sequenced natural and synthetic plasmid genome sequences. This diversity can be represented by a data structure that captures all presently available nucleotide sequences, known as a pan-genome. In our case, we construct a single linear pan-genome nucleotide sequence that captures this diversity. To process such a large number of sequences, we introduce the plaster algorithmic pipeline. Using plaster we are able to construct the full synthetic plasmid pan-genome from 51,047 synthetic plasmid sequences as well as a natural pan-genome from 6,642 natural plasmid sequences. We demonstrate the efficacy of plaster by comparing its speed against another pan-genome construction method as well as demonstrating that nearly all plasmids align well to their corresponding pan-genome. Finally, we explore the use of pan-genome sequence alignment to distinguish between naturally occurring and synthetic plasmids. We believe this approach will lead to new techniques for rapid characterization of engineered plasmids. Applications for this work include detection of genome editing, tracking an unknown plasmid back to its lab of origin, and identifying naturally occurring sequences that may be of use to the synthetic biology community. The source code for fully reconstructing the natural and synthetic plasmid pan-genomes as well for plaster are publicly available and can be downloaded at https://gitlab.com/qiwangrice/plaster.git.
机译:随着序列数据库的发展,表征极其庞大的基因组集合的多样性需要开发有效的方法,以避免昂贵的全对全比较[Marschall et al。,2018]。除了被测序的天然基因组数量呈指数增长外,用于创建人类工程序列的改良技术也迎来了新的合成基因组序列数据库浪潮,该数据库与天然存在的基因组数据库一起成长。在本文中,我们分析了可用的天然和合成质粒基因组序列的全部多样性。这种多样性可以通过捕获所有目前可用的核苷酸序列(称为全基因组)的数据结构来表示。在我们的案例中,我们构建了捕获这种多样性的单个线性全基因组核苷酸序列。为了处理如此大量的序列,我们引入了石膏算法流水线。使用石膏,我们能够从51,047个合成质粒序列构建完整的合成质粒全基因组,并从6,642个天然质粒序列构建天然的泛基因组。我们通过比较石膏与另一种全基因组构建方法的速度,以及证明几乎所有质粒与其相应的全基因组对齐方式,来证明石膏的功效。最后,我们探索使用泛基因组序列比对来区分天然质粒和合成质粒。我们相信这种方法将导致快速表征工程化质粒的新技术。这项工作的应用包括检测基因组编辑,追踪未知质粒回到其来源实验室,以及鉴定可能对合成生物学界有用的天然存在的序列。完全重建天然和合成质粒泛基因组以及用于灰泥的源代码是公开可用的,可以从https://gitlab.com/qiwangrice/plaster.git下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号