首页> 美国卫生研究院文献>other >Taxonomic Identity Resolution of Highly Phylogenetically Related Strains and Selection of Phylogenetic Markers by Using Genome-Scale Methods: The Bacillus pumilus Group Case
【2h】

Taxonomic Identity Resolution of Highly Phylogenetically Related Strains and Selection of Phylogenetic Markers by Using Genome-Scale Methods: The Bacillus pumilus Group Case

机译:高度系统发育相关菌株的生物分类身份解析和系统基因组尺度方法的系统发生标记选择:短小芽孢杆菌组病例

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Bacillus pumilus group strains have been studied due their agronomic, biotechnological or pharmaceutical potential. Classifying strains of this taxonomic group at species level is a challenging procedure since it is composed of seven species that share among them over 99.5% of 16S rRNA gene identity. In this study, first, a whole-genome in silico approach was used to accurately demarcate B. pumilus group strains, as a case of highly phylogenetically related taxa, at the species level. In order to achieve that and consequently to validate or correct taxonomic identities of genomes in public databases, an average nucleotide identity correlation, a core-based phylogenomic and a gene function repertory analyses were performed. Eventually, more than 50% such genomes were found to be misclassified. Hierarchical clustering of gene functional repertoires was also used to infer ecotypes among B. pumilus group species. Furthermore, for the first time the machine-learning algorithm Random Forest was used to rank genes in order of their importance for species classification. We found that ybbP, a gene involved in the synthesis of cyclic di-AMP, was the most important gene for accurately predicting species identity among B. pumilus group strains. Finally, principal component analysis was used to classify strains based on the distances between their ybbP genes. The methodologies described could be utilized more broadly to identify other highly phylogenetically related species in metagenomic or epidemiological assessments.
机译:由于其农学,生物技术或药学潜力,已经研究了短芽孢杆菌组菌株。在物种水平上对该分类组的菌株进行分类是一项具有挑战性的步骤,因为它由7个物种组成,它们之间共享16S rRNA基因同一性的99.5%以上。在这项研究中,首先,采用全基因组计算机方法,在物种水平上,以高度系统发育相关的分类单元为例,准确地划分了短双歧杆菌群菌株。为了实现这一点,并因此验证或纠正公共数据库中基因组的分类学身份,进行了平均核苷酸同一性相关性,基于核心的系统遗传学和基因功能库分析。最终,发现超过50%的此类基因组被错误分类。基因功能库的层次聚类也被用于推断短小芽孢杆菌群物种之间的生态型。此外,机器学习算法首次将随机森林用于对基因进行排序,以便区分它们对物种分类的重要性。我们发现,ybbP,一个参与合成环状双AMP的基因,是最重要的基因,可以准确地预测短双歧杆菌群菌株之间的物种同一性。最后,使用主成分分析根据菌株的ybbP基因之间的距离对菌株进行分类。所描述的方法可以更广泛地用于在宏基因组学或流行病学评估中识别其他与系统发育高度相关的物种。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号