首页> 外文期刊>Bioinformatics >Analyzing taxonomic classification using extensible Markov models.
【24h】

Analyzing taxonomic classification using extensible Markov models.

机译:使用可扩展的马尔可夫模型分析分类学分类。

获取原文
获取原文并翻译 | 示例
       

摘要

MOTIVATION: As next generation sequencing is rapidly adding new genomes, their correct placement in the taxonomy needs verification. However, the current methods for confirming classification of a taxon or suggesting revision for a potential misplacement relies on computationally intense multi-sequence alignment followed by an iterative adjustment of the distance matrix. Due to intra-heterogeneity issues with the 16S rRNA marker, no classifier is available for sub-genus level, which could readily suggest a classification for a novel 16S rRNA sequence. Metagenomics further complicates the issue by generating fragmented 16S rRNA sequences. This article proposes a novel alignment-free method for representing the microbial profiles using extensible Markov models (EMMs) with an extended Karlin-Altschul statistical framework similar to the classic alignment paradigm. We propose a log odds (LODs) score classifier based on Gumbel difference distribution that confirms correct classifications with statistical significance qualifications and suggests revisions where necessary. RESULTS: We tested our method by generating a sub-genus level classifier with which we re-evaluated classifications of 676 microbial organisms using the NCBI FTP database for the 16S rRNA. The results confirm current classification for all genera while ascertaining significance at 95%. Furthermore, this novel classifier isolates heterogeneity issues to a mere 12 strains while confirming classifications with significance qualification for the remaining 98%. The models require less memory than that needed by multi-sequence alignments and have better time complexity than the current methods. The classifier operates at sub-genus level, and thus outperforms the naive Bayes classifier of the RNA Database Project where much of the taxonomic analysis is available online. Finally, using information redundancy in model building, we show that the method applies to metagenomic fragment classification of 19 Escherichia coli strains. Availability and implementation: Source code and binaries freely available for download at http://lyle.smu.edu/IDA/EMMSA/, implemented in JAVA and supported on MS Windows.
机译:动机:随着下一代测序技术正在迅速添加新的基因组,它们在分类学中的正确位置需要验证。但是,当前用于确认分类单元分类或建议对潜在错位进行修订的方法依赖于计算强度高的多序列比对,然后是距离矩阵的迭代调整。由于16S rRNA标记存在异质性问题,因此没有用于亚属水平的分类器,这很容易建议对新的16S rRNA序列进行分类。元基因组学通过产生片段化的16S rRNA序列使问题更加复杂。本文提出了一种新颖的免比对方法,该方法使用可扩展的马尔可夫模型(EMM)和扩展的Karlin-Altschul统计框架来表示微生物概况,该方法类似于经典的比对范式。我们提出基于Gumbel差异分布的对数赔率(LODs)分数分类器,该分类器可确认具有统计显着性资格的正确分类,并在必要时建议进行修订。结果:我们通过产生亚属水平分类器测试了我们的方法,利用该分类器我们使用NCBI FTP数据库对16S rRNA重新评估了676种微生物的分类。结果证实了所有属的当前分类,同时确定了95%的显着性。此外,该新型分类器将异质性问题分离为仅12个菌株,同时确认了对其余98%具有重要意义的分类。与多序列比对相比,这些模型所需的内存更少,并且比当前方法具有更好的时间复杂度。该分类器在子属级别上运行,因此胜过RNA数据库项目中朴素的贝叶斯分类器,该分类器中的许多分类学分析都可在线获得。最后,在模型构建中使用信息冗余,我们表明该方法适用于19株大肠杆菌的宏基因组片段分类。可用性和实现:可从http://lyle.smu.edu/IDA/EMMSA/免费下载源代码和二进制文件,这些代码和二进制文件在JAVA中实现并在MS Windows上受支持。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号