首页> 外文期刊>Microbial Genomics >Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
【24h】

Detection of plasmid contigs in draft genome assemblies using customized Kraken databases

机译:使用定制的克拉肯数据库检测基因组组件草案中的质粒contigs

获取原文
           

摘要

Plasmids play an important role in bacterial evolution and mediate horizontal transfer of genes including virulence and antimicrobial resistance genes. Although short-read sequencing technologies have enabled large-scale bacterial genomics, the resulting draft genome assemblies are often fragmented into hundreds of discrete contigs. Several tools and approaches have been developed to identify plasmid sequences in such assemblies, but require trade-off between sensitivity and specificity. Here we propose using the Kraken classifier, together with a custom Kraken database comprising known chromosomal and plasmid sequences of Klebsiella pneumoniae species complex (KpSC), to identify plasmid-derived contigs in draft assemblies. We assessed performance using Illumina-based draft genome assemblies for 82 KpSC isolates, for which complete genomes were available to supply ground truth. When benchmarked against five other classifiers (Centrifuge, RFPlasmid, mlplasmids, PlaScope and Platon), Kraken showed balanced performance in terms of overall sensitivity and specificity (90.8 and 99.4 %, respectively, for contig count; 96.5 and 99.9 %, respectively, for cumulative contig length), and the highest accuracy (96.8% vs 91.8-96.6% for contig count; 99.8% vs 99.0-99.7 % for cumulative contig length), and F1-score (94.5 % vs 84.5-94.1 %, for contig count; 98.0 % vs 88.9-96.7 % for cumulative contig length). Kraken also achieved consistent performance across our genome collection. Furthermore, we demonstrate that expanding the Kraken database with additional known chromosomal and plasmid sequences can further improve classification performance. Although we have focused here on the KpSC, this methodology could easily be applied to other species with a sufficient number of completed genomes.
机译:质粒在细菌演化中起重要作用,并介导基因的水平转移,包括毒力和抗微生物抗性基因。虽然短读测序技术使大规模的细菌基因组学能够,但是由此产生的基因组组件草案通常是分裂成数百个离散的折叠。已经开发了几种工具和方法来鉴定这种组件中的质粒序列,但需要在敏感性和特异性之间进行权衡。在这里,我们建议使用克拉肯分类器,以及包括克雷布氏菌肺炎群综合体(KPSC)的已知染色体和质粒序列的定制克拉克群数据库,以鉴定组件草稿中的质粒衍生的葡萄球菌。我们评估了使用基于Illumina的基因组组件进行了评估了82 kpsc分离株的性能,其中可以提供完整的基因组来提供地面真理。当针对其他五种分类器(离心机,RFPLASMID,MLPLASMID,PLASCOPE和PLATON)进行基准测试时,克拉肯在整体敏感性和特异性方面表现出平衡的性能(分别为CONTIG计数的90.8和99.4%; 96.5和GT; 99.9%,对于累积的折叠长度,最高精度(96.8%对91.8-96.6%的CONTIG计数;累积冠状体长度的99.8%vs 99.0-99.7%),F1分数(94.5%与84.5-94.1%,for contig计数;累积折叠长度的98.0%与88.9-96.7%)。克朗森在我们的基因组收集中也实现了一致的表现。此外,我们证明具有另外的已知染色体和质粒序列的扩展克拉肯数据库可以进一步提高分类性能。虽然我们在KPSC上专注于KPSC,但这种方法可以很容易地应用于具有足够数量的完成基因组的其他物种。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号