首页> 外文会议>Annual international conference on research in computational molecular biology >Accurate Reconstruction of Microbial Strains from Metagenomic Sequencing Using Representative Reference Genomes
【24h】

Accurate Reconstruction of Microbial Strains from Metagenomic Sequencing Using Representative Reference Genomes

机译:使用代表性的参考基因组从超基因组测序中准确重建微生物菌株。

获取原文

摘要

Exploring the genetic diversity of microbes within the environment through metagenomic sequencing first requires classifying these reads into taxonomic groups. Current methods compare these sequencing data with existing biased and limited reference databases. Several recent evaluation studies demonstrate that current methods either lack sufficient sensitivity for species-level assignments or suffer from false positives, overestimating the number of species in the metagenome. Both are especially problematic for the identification of low-abundance microbial species, e. g. detecting pathogens in ancient metagenomic samples. We present a new method, SPARSE, which improves taxonomic assignments of metagenomic reads. SPARSE balances existing biased reference databases by grouping reference genomes into similarity-based hierarchical clusters, implemented as an efficient incremental data structure. SPARSE assigns reads to these clusters using a probabilistic model, which specifically penalizes non-specific mappings of reads from unknown sources and hence reduces false-positive assignments. Our evaluation on simulated datasets from two recent evaluation studies demonstrated the improved precision of SPARSE in comparison to other methods for species-level classification. In a third simulation, our method successfully differentiated multiple co-existing Escherichia coli strains from the same sample. In real archaeological datasets, SPARSE identified ancient pathogens with ≤0.02% abundance, consistent with published findings that required additional sequencing data. In these datasets, other methods either missed targeted pathogens or reported non-existent ones. SPARSE and all evaluation scripts are available at https://github.com/zheminzhou/SPARSE.
机译:通过宏基因组测序探索环境中微生物的遗传多样性首先需要将这些读数分类为生物分类。当前的方法将这些测序数据与现有的有偏见和有限的参考数据库进行比较。最近的几项评估研究表明,当前的方法要么对物种级别的分配缺乏足够的敏感性,要么遭受假阳性的影响,从而高估了元基因组中的物种数量。两者对于鉴定低丰度微生物物种,例如大肠杆菌尤其有问题。 G。检测古代宏基因组样本中的病原体。我们提出了一种新的方法SPARSE,它可以改善宏基因组读取的分类学分配。 SPARSE通过将参考基因组分组为基于相似度的层次聚类(实现为有效的增量数据结构)来平衡现有的有偏见的参考数据库。 SPARSE使用概率模型将读取分配给这些簇,该模型专门惩罚了来自未知来源的读取的非特定映射,因此减少了假阳性分配。我们对来自两项最新评估研究的模拟数据集的评估表明,与其他用于物种级别分类的方法相比,SPARSE的精度有所提高。在第三次模拟中,我们的方法成功地从同一样品中区分出多种共存的大肠杆菌菌株。在真实的考古数据集中,SPARSE识别出了≤0.02%丰度的古代病原体,这与需要更多测序数据的已发表发现是一致的。在这些数据集中,其他方法要么错过了目标病原体,要么报告了不存在的病原体。 SPARSE和所有评估脚本可在https://github.com/zheminzhou/SPARSE上获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号