...
首页> 外文期刊>Systematic Biology >Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation
【24h】

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation

机译:使用均匀化和数据增强推断系统发育上的复杂DNA替代过程

获取原文
获取原文并翻译 | 示例
           

摘要

A new method is developed for calculating sequence substitution probabilities using Markov chain Monte Carlo (MCMC) methods. The basic strategy is to use uniformization to transform the original continuous time Markov process into a Poisson substitution process and a discrete Markov chain of state transitions. An efficient MCMC algorithm for evaluating substitution probabilities by this approach using a continuous gamma distribution to model site-specific rates is outlined. The method is applied to the problem of inferring branch lengths and site-specific rates from nucleotide sequences under a general time-reversible (GTR) model and a computer program BYPASSR is developed. Simulations are used to examine the performance of the new program relative to an existing program BASEML that uses a discrete approximation for the gamma distributed prior on site-specific rates. It is found that BASEML and BYPASSR are in close agreement when inferring branch lengths, regardless of the number of rate categories used, but that BASEML tends to underestimate high site-specific substitution rates, and to overestimate intermediate rates, when fewer than 50 rate categories are used. Rate estimates obtained using BASEML agree more closely with those of BYPASSR as the number of rate categories increases. Analyses of the posterior distributions of site-specific rates from BYPASSR suggest that a large number of taxa are needed to obtain precise estimates of site-specific rates, especially when rates are very high or very low. The method is applied to analyze 45 sequences of the alpha 2B adrenergic receptor gene (A2AB) from a sample of eutherian taxa. In general, the pattern expected for regions under negative selection is observed with third codon positions having the highest inferred rates, followed by first codon positions and with second codon positions having the lowest inferred rates. Several sites show exceptionally high substitution rates at second codon positions that may represent the effects of positive selection.
机译:开发了一种使用马尔可夫链蒙特卡洛(MCMC)方法计算序列取代概率的新方法。基本策略是使用均匀化将原始的连续时间马尔可夫过程转换为泊松替换过程和状态转移的离散马尔可夫链。概述了一种有效的MCMC算法,该算法通过使用连续伽马分布来模拟特定地点的费率的这种方法来评估替代概率。该方法应用于在一般时间可逆(GTR)模型下从核苷酸序列推断分支长度和位点特异性速率的问题,并开发了计算机程序BYPASSR。使用模拟来检查新程序相对于现有程序BASEML的性能,该程序对特定于站点的速率事先分配的伽玛使用离散近似。发现在推断分支长度时,BASEML和BYPASSR密切相关,而与使用的速率类别的数量无关,但是当少于50个速率类别时,BASEML倾向于低估特定于站点的高取代率,而高估中间速率。被使用。随着费率类别数量的增加,使用BASEML获得的费率估计与BYPASSR的估计值更加接近。 BYPASSR对特定地点比率的后验分布分析表明,需要大量的分类单元以获得特定地点比率的精确估计,尤其是当比率非常高或非常低时。该方法适用于分析来自欧亚大陆类群样品的45个α2B肾上腺素能受体基因(A2AB)序列。通常,观察到负选择下区域的预期模式,其中第三密码子位置具有最高的推断率,其次是第一密码子位置和第二密码子位置具有最低的推断率。多个位点在第二个密码子位置显示出极高的取代率,这可能代表阳性选择的作用。

著录项

  • 来源
    《Systematic Biology》 |2006年第2期|259-269|共11页
  • 作者单位

    Department of Medical Genetics University of Alberta Edmonton Alberta Canada;

    Genome Center and Section of Evolution and Ecology University of California Davism One Shields Avenue Davis California 95616 USA E-mail: brannala{at}ucdavis.edu;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号