首页> 外文期刊>Molecular biology and evolution >A Bayesian Mixture Model for Across-Site Heterogeneities in the Amino-Acid Replacement Process
【24h】

A Bayesian Mixture Model for Across-Site Heterogeneities in the Amino-Acid Replacement Process

机译:氨基酸置换过程中跨位点异质性的贝叶斯混合模型

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Most current models of sequence evolution assume that all sites of a protein evolve under the same substitution process, characterized by a 20 x 20 substitution matrix. Here, we propose to relax this assumption by developing a Bayesian mixture model that allows the amino-acid replacement pattern at different sites of a protein alignment to be described by distinct substitution processes. Our model, named CAT, assumes the existence of distinct processes (or classes) differing by their equilibrium frequencies over the 20 residues. Through the use of a Dirichlet process prior, the total number of classes and their respective amino-acid profiles, as well as the affiliations of each site to a given class, are all free variables of the model. In this way, the CAT model is able to adapt to the complexity actually present in the data, and it yields an estimate of the substitutional heterogeneity through the posterior mean number of classes. We show that a significant level of heterogeneity is present in the substitution patterns of proteins, and that the standard one-matrix model fails to account for this heterogeneity. By evaluating the Bayes factor, we demonstrate that the standard model is outperformed by CAT on all of the data sets which we analyzed. Altogether, these results suggest that the complexity of the pattern of substitution of real sequences is better captured by the CAT model, offering the possibility of studying its impact on phylogenetic reconstruction and its connections with structure-function determinants.
机译:目前大多数序列进化模型都假设蛋白质的所有位点都在相同的取代过程中进化,其特征是 20 x 20 的替换基质。在这里,我们建议通过开发贝叶斯混合模型来放宽这一假设,该模型允许通过不同的取代过程来描述蛋白质比对不同位点的氨基酸替换模式。我们的模型名为CAT,假设存在不同的过程(或类别),它们在20个残基上的平衡频率不同。通过使用狄利克雷先验过程,类的总数及其各自的氨基酸谱,以及每个位点与给定类的隶属关系,都是模型的自由变量。通过这种方式,CAT模型能够适应数据中实际存在的复杂性,并通过类的后验平均数来估计替代异质性。我们发现,蛋白质的替代模式中存在显着的异质性,而标准的单矩阵模型无法解释这种异质性。通过评估贝叶斯因子,我们证明了标准模型在我们分析的所有数据集上都优于 CAT。总而言之,这些结果表明,CAT模型可以更好地捕捉真实序列替换模式的复杂性,从而为研究其对系统发育重建的影响及其与结构功能决定因素的联系提供了可能性。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号