...
首页> 外文期刊>Molecular biology and evolution >Testing the Infinitely Many Genes Model for the Evolution of the Bacterial Core Genome and Pangenome
【24h】

Testing the Infinitely Many Genes Model for the Evolution of the Bacterial Core Genome and Pangenome

机译:测试细菌核心基因组和泛基因组进化的无数基因模型

获取原文
获取原文并翻译 | 示例
           

摘要

When groups of related bacterial genomes are compared, the number of core genes found in all genomes is usually much less than the mean genome size, whereas the size of the pangenome (the set of genes found on at least one of the genomes) is much larger than the mean size of one genome. We analyze 172 complete genomes of Bacilli and compare the properties of the pangenomes and core genomes of monophyletic subsets taken from this group. We then assess the capabilities of several evolutionary models to predict these properties. The infinitely many genes (IMG) model is based on the assumption that each new gene can arise only once. The predictions of the model depend on the shape of the evolutionary tree that underlies the divergence of the genomes. We calculate results for coalescent trees, star trees, and arbitrary phylogenetic trees of predefined fixed branch length. On a star tree, the pangenome size increases linearly with the number of genomes, as has been suggested in some previous studies, whereas on a coalescent tree, it increases logarithmically. The coalescent tree gives a better fit to the data, for all the examples we consider. In some cases, a fixed phylogenetic tree proved better than the coalescent tree at reproducing structure in the gene frequency spectrum, but little improvement was gained in predictions of the core and pangenome sizes. Most of the data are well explained by a model with three classes of gene: an essential class that is found in all genomes, a slow class whose rate of origination and deletion is slow compared with the time of divergence of the genomes, and a fast class showing rapid origination and deletion. Although the majority of genes originating in a genome are in the fast class, these genes are not retained for long periods, and the majority of genes present in a genome are in the slow or essential classes. In general, we show that the IMG model is useful for comparison with experimental genome data both for species level and widely divergent taxonomic groups.
机译:当比较相关细菌基因组的组时,在所有基因组中发现的核心基因的数量通常比平均基因组大小小得多,而全基因组(至少在一个基因组上发现的一组基因)的大小大得多。大于一个基因组的平均大小。我们分析了芽孢杆菌的172个完整基因组,并比较了该组的全基因组和单基因组子集的核心基因组的特性。然后,我们评估几种进化模型预测这些特性的能力。无限多个基因(IMG)模型基于每个新基因只能出现一次的假设。该模型的预测取决于进化树的形状,而进化树的形状是基因组差异的基础。我们为合并的树,星形树和预定义的固定分支长度的任意系统树计算结果。如先前的一些研究所建议,在星形树上,全基因组的大小随基因组数目的增加而线性增加,而在聚结树上,其对数增加。对于我们考虑的所有示例,合并树都更适合数据。在某些情况下,固定的系统树在基因频谱的复制结构上比聚结树更好,但在核心和全基因组大小的预测中却没有什么改善。带有三类基因的模型可以很好地解释大多数数据:在所有基因组中都可以找到的基本类;与基因组发散时间相比,其慢类的起源和缺失速率很慢;类,显示快速的起源和删除。尽管起源于基因组的大多数基因属于快类,但这些基因不能长期保留,而存在于基因组中的大多数基因属于慢类或必需类。总的来说,我们表明,IMG模型可用于与物种水平和广泛不同的分类组的实验基因组数据进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号