...
首页> 外文期刊>Systematic Biology >Mixture Models of Nucleotide Sequence Evolution that Account for Heterogeneity in the Substitution Process Across Sites and Across Lineages
【24h】

Mixture Models of Nucleotide Sequence Evolution that Account for Heterogeneity in the Substitution Process Across Sites and Across Lineages

机译:核苷酸序列进化的混合模型,说明了跨位点和跨谱系的替代过程中的异质性

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Molecular phylogenetic studies of homologous sequences of nucleotides often assume that the underlying evolutionary process was globally stationary, reversible, and homogeneous (SRH), and that a model of evolution with one or more site-specific and time-reversible rate matrices (e.g., the GTR rate matrix) is enough to accurately model the evolution of data over the whole tree. However, an increasing body of data suggests that evolution under these conditions is an exception, rather than the norm. Toaddress this issue, several non-SRH models of molecular evolution have been proposed, but they either ignore heterogeneity in the substitution process across sites (HAS) or assume it can be modeled accurately using the GAMMA distribution. As an alternative to these models of evolution, we introduce a family of mixture models that approximate HAS without the assumption of an underlying predefined statistical distribution. This family of mixture models is combined with non-SRH models of evolution that account for heterogeneity in the substitution process across lineages (HAL). We also present two algorithms for searching model space and identifying an optimal model of evolution that is less likely to over- or underparameterize the data. The performance of the two new algorithms was evaluated using alignments of nucleotides with 10 000 sites simulated under complex non-SRH conditions on a 25-tipped tree. The algorithms were found to be very successful, identifying the correct HAL model with a 75% successrate (the average success rate for assigning rate matrices to the tree's 48 edges was 99.25%) and, for the correct HAL model, identifying the correct HAS model with a 98% success rate. Finally, parameter estimates obtained under the correct HAL-HAS model were found to be accurate and precise. The merits of our new algorithms were illustrated with an analysis of 42 337 second codon sites extracted from a concatenation of 106 alignments of orthologous genes encoded by the nuclear genomes of Saccharomycescerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. castellii, S. kluyveri, S. bayanus, and Candida albicans. Our results show that second codon sites in the ancestral genome of these species contained 49.1% invariable sites, 39.6% variable sitesbelonging to one rate category (V_1), and 11.3% variable sites belonging to a second rate category (V_2). The ancestral nucleotide content was found to differ markedly across these three sets of sites, and the evolutionary processes operating at the variable sites were found to be non-SRH and best modeled by a combination of eight edge-specific rate matrices (four for V_1 and four for V_2). The number of substitutions per site at the variable sites also differed markedly, with sites belonging to V_1 evolving slower than those belonging to V_2 along the lineages separating the seven species of Saccharomyces. Finally, sites belonging to V_1 appeared to have ceased evolving along the lineages separating S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus, implying that they might have become so selectively constrained that they could be considered invariable sites in these species.
机译:核苷酸同源序列的分子系统发育研究通常假设潜在的进化过程是整体平稳,可逆和均质的(SRH),并且具有一个或多个位点特异性和时间可逆速率矩阵的进化模型(例如, GTR率矩阵)足以准确地模拟整个树上数据的演化。但是,越来越多的数据表明,在这些条件下的进化是一个例外,而不是常规。为了解决这个问题,已经提出了几种分子进化的非SRH模型,但是它们要么忽略了跨位点替换过程中的异质性(HAS),要么假定可以使用GAMMA分布对其进行精确建模。作为这些演化模型的替代方法,我们引入了一系列混合模型,这些模型近似于HAS,而无需假设基本的预定义统计分布。该混合模型家族与非SRH进化模型相结合,该模型解释了跨谱系(HAL)替换过程中的异质性。我们还提出了两种算法,用于搜索模型空间和确定最优的演化模型,该模型不太可能过度或过度参数化数据。这两种新算法的性能是通过在25针尖的树上在复杂的非SRH条件下模拟的具有10000个位点的核苷酸比对评估的。发现该算法非常成功,可以识别出正确率达到75%的正确HAL模型(将速率矩阵分配给树的48个边的平均成功率为99.25%),并且对于正确的HAL模型,可以识别出正确的HAS模型成功率为98%。最后,发现在正确的HAL-HAS模型下获得的参数估计是准确和精确的。通过分析从酵母菌,悖论,S。mikatae,S。kudriavzevii,S。castellii的核基因组编码的直系同源基因的106个比对序列中提取的42 337个第二密码子位点,可以说明我们新算法的优点。 ,S。kluyveri,S。bayanus和白色念珠菌。我们的研究结果表明,这些物种的祖先基因组中的第二个密码子位点包含49.1%不变位点,39.6%属于一个速率类别(V_1)的可变位点和11.3%属于第二个速率类别(V_2)的可变位点。发现这三组位点的祖先核苷酸含量显着不同,并且发现在可变位点处运行的进化过程是非SRH的,并且最好通过八个边缘特异性比率矩阵的组合(对于V_1和V_1分别使用四个矩阵)进行建模。 V_2四个)。可变位点处每个位点的取代数目也显着不同,属于V_1的位点沿分离七种酵母菌的世系进化得比属于V_2的位点慢。最后,属于V_1的位点似乎已停止沿酿酒酵母,悖论链球菌,米卡链球菌,库德里亚维氏菌和巴耶纳酵母菌的谱系演化,这暗示着它们可能已经被选择性地限制为可以被认为是它们了。这些物种中的恒定位点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号