首页> 外文期刊>mSystems >MetaPalette: a k-mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation
【24h】

MetaPalette: a k-mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation

机译:MetaPalette:一种k-mer绘画方法,用于元基因组分类学分析和新型菌株变异的量化

获取原文
           

摘要

Metagenomic profiling is challenging in part because of the highly uneven sampling of the tree of life by genome sequencing projects and the limitations imposed by performing phylogenetic inference at fixed taxonomic ranks. We present the algorithm MetaPalette, which uses long k -mer sizes ( k =?30,?50) to fit a k -mer “palette” of a given sample to the k -mer palette of reference organisms. By modeling the k -mer palettes of unknown organisms, the method also gives an indication of the presence, abundance, and evolutionary relatedness of novel organisms present in the sample. The method returns a traditional, fixed-rank taxonomic profile which is shown on independently simulated data to be one of the most accurate to date. Tree figures are also returned that quantify the relatedness of novel organisms to reference sequences, and the accuracy of such figures is demonstrated on simulated spike-ins and a metagenomic soil sample. The software implementing MetaPalette is available at: https://github.com/dkoslicki/MetaPalette . Pretrained databases are included for Archaea , Bacteria , Eukaryota , and viruses. IMPORTANCE Taxonomic profiling is a challenging first step when analyzing a metagenomic sample. This work presents a method that facilitates fine-scale characterization of the presence, abundance, and evolutionary relatedness of organisms present in a given sample but absent from the training database. We calculate a “ k -mer palette” which summarizes the information from all reads, not just those in conserved genes or containing taxon-specific markers. The compositions of palettes are easy to model, allowing rapid inference of community composition. In addition to providing strain-level information where applicable, our approach provides taxonomic profiles that are more accurate than those of competing methods. Author Video : An author video summary of this article is available.
机译:由于基因组测序项目对生命树的采样高度不均匀,以及在固定分类学等级上进行系统发育推断所施加的限制,因此元基因组分析颇具挑战性。我们提出了MetaPalette算法,该算法使用较长的k-mer尺寸(k =?30,?50)来将给定样品的k-mer“调色板”拟合到参考生物的k-mer调色板。通过对未知生物的k-mer调色板进行建模,该方法还可以指示样品中存在的新型生物的存在,丰度和进化相关性。该方法返回传统的固定等级生物分类概况,该概况在独立模拟的数据上显示为迄今为止最准确的一种。还返回了树图,这些树图量化了新生物与参考序列的相关性,并且在模拟的尖刺和宏基因组土壤样本上证明了这些图的准确性。可通过以下网址获得实现MetaPalette的软件:https://github.com/dkoslicki/MetaPalette。包括针对古细菌,细菌,真核生物和病毒的预先培训的数据库。重要信息在分析宏基因组学样品时,分类学分析是具有挑战性的第一步。这项工作提出了一种方法,可以促进对给定样品中存在但训练数据库中不存在的生物的存在,丰度和进化相关性进行精细表征。我们计算出一个“ k-mer调色板”,该摘要总结了所有读数的信息,而不仅仅是保守基因或包含分类群特异性标记的读数。调色板的组成易于建模,可以快速推断社区组成。除了在适用的情况下提供菌株水平的信息外,我们的方法还提供了比竞争方法更准确的分类概况。作者视频:本文提供了作者视频摘要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号