首页> 美国卫生研究院文献>Genetics >Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution
【2h】

Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution

机译:评估祖先序列重建方法以推断核苷酸取代的非平稳模式

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Inference of gene sequences in ancestral species has been widely used to test hypotheses concerning the process of molecular sequence evolution. However, the approach may produce spurious results, mainly because using the single best reconstruction while ignoring the suboptimal ones creates systematic biases. Here we implement methods to correct for such biases and use computer simulation to evaluate their performance when the substitution process is nonstationary. The methods we evaluated include parsimony and likelihood using the single best reconstruction (SBR), averaging over reconstructions weighted by the posterior probabilities (AWP), and a new method called expected Markov counting (EMC) that produces maximum-likelihood estimates of substitution counts for any branch under a nonstationary Markov model. We simulated base composition evolution on a phylogeny for six species, with different selective pressures on G+C content among lineages, and compared the counts of nucleotide substitutions recorded during simulation with the inference by different methods. We found that large systematic biases resulted from (i) the use of parsimony or likelihood with SBR, (ii) the use of a stationary model when the substitution process is nonstationary, and (iii) the use of the Hasegawa-Kishino-Yano (HKY) model, which is too simple to adequately describe the substitution process. The nonstationary general time reversible (GTR) model, used with AWP or EMC, accurately recovered the substitution counts, even in cases of complex parameter fluctuations. We discuss model complexity and the compromise between bias and variance and suggest that the new methods may be useful for studying complex patterns of nucleotide substitution in large genomic data sets.
机译:祖先物种中基因序列的推论已被广泛用于检验有关分子序列进化过程的假设。但是,该方法可能会产生虚假结果,主要是因为使用单个最佳重构而忽略了次优重构会产生系统偏差。在这里,我们实现纠正此类偏差的方法,并在替代过程不稳定时使用计算机仿真来评估其性能。我们评估的方法包括使用单一最佳重构(SBR)的简约性和似然性,对后验概率(AWP)加权的重构进行平均的方法,以及一种称为期望马尔可夫计数(EMC)的新方法,该新方法可生成替代计数的最大似然估计非平稳马尔可夫模型下的任何分支。我们在六个物种的系统发育上模拟了碱基组成的进化,对谱系之间的G + C含量具有不同的选择压力,并比较了模拟过程中记录的核苷酸取代计数,并通过不同方法进行了推论。我们发现,大的系统偏差是由于(i)使用简约或SBR可能性,(ii)当替换过程不稳定时使用平稳模型,以及(iii)长谷川-吉野-野野( HKY)模型,该模型过于简单,无法充分描述替代过程。与AWP或EMC一起使用的非平稳通用时间可逆(GTR)模型,即使在复杂的参数波动的情况下,也可以准确地恢复替代计数。我们讨论模型的复杂性以及偏差和方差之间的折衷,并建议新方法可能对研究大型基因组数据集中核苷酸取代的复杂模式很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号