首页> 外文期刊>BMC Bioinformatics >A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
【24h】

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

机译:一种提高物种水平准确性的16S rRNA基因序列的贝叶斯分类分类方法

获取原文
       

摘要

Background Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. Results We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Conclusions Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
机译:背景技术16S rRNA基因序列的物种级分类对于微生物组研究人员仍然是一个严峻的挑战,因为现有的16S rRNA基因序列的分类学分类工具要么不提供物种级分类,要么其分类结果不可靠。不可靠的结果是由于现有方法的局限性,要么缺乏基于可靠概率的标准来评估其分类学指称的置信度,要么使用核苷酸k-mer频率作为序列相似性测量的代理。结果我们开发了一种方法,该方法显示出比现有方法明显改善的物种级别分类结果。我们的方法使用成对序列比对来计算查询序列和数据库匹配之间的真实序列相似性。基于每个查询序列的多个数据库命中的最低共同祖先,从物种到门类级别进行分类分类,并通过自举置信度评分评估进一步的分类可靠性。我们方法的新颖性在于,基于数据库命中与查询序列的序列相似度,通过贝叶斯后验概率加权每个数据库命中对查询序列的分类分配的贡献。我们的方法不需要特定于不同分类组的任何训练数据集。取而代之的是,仅需要参考数据库即可与查询序列进行比对,从而使我们的方法可轻松应用于16S rRNA基因或其他系统发育标记基因的不同区域。结论对16S rRNA或其他系统发生标记基因进行可靠的物种级分类对于微生物组研究至关重要。与现有工具相比,我们的软件显示出明显更高的分类准确性,并且我们提供了基于概率的置信度得分,可以基于对查询序列的多个数据库匹配来评估分类分类分配的可靠性。尽管其计算成本较高,但我们的方法仍适合用于实际目的分析大规模微生物组数据集。此外,我们的方法可用于任何系统发生标记基因序列的分类学分类。我们的名为BLCA的软件可从https://github.com/qunfengdong/BLCA免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号