...
首页> 外文期刊>Systematic Biology >Searching for Convergence in Phylogenetic Markov Chain Monte Carlo
【24h】

Searching for Convergence in Phylogenetic Markov Chain Monte Carlo

机译:在系统发生马尔可夫链蒙特卡罗中寻找收敛

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, δ and ε, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a “metachain” to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely.
机译:马尔可夫链蒙特卡罗(MCMC)是一种在种系学界得到广泛使用的方法,并且是诸如MrBayes之类的种系学软件包的核心。 MCMC方法用户的一个重要问题是如何为可调参数选择合适的值,例如一个或多个马尔可夫链的长度,采样密度,建议机制,以及如果使用大都会耦合的MCMC,加热链及其温度。尽管某些参数设置已在文献中进行了详细检查,但在选择其他参数时通常会更多考虑计算时间或其他数据集的个人经验。这样的选择可能导致对树空间的采样不足或对计算资源的利用不充分。我们对70种具有不同大小和分类组成的随机选择的,直系同源蛋白质集进行了融合和混合的详细研究。来自多个随机起点的重复运行可以对收敛性进行更严格的评估,为此,我们开发了两个新颖的统计量δ和ε。尽管似然值总是很快稳定下来,但是对树形拓扑的后验分布进行足够的采样花费了相当长的时间。我们的结果表明,多态性对于具有30个或更多分类单元的数据集是常见的,并且这会导致收敛和混合缓慢。但是,我们还发现,务实的方法是将来自几个短的,重复的运行的数据组合成一个“元链”,以估计二分后验概率,这种方法提供了很好的近似值,并且在近似参考后验分布方面,这种估计并不比使用后验概率获得的估计差。与元链长度相同的单个长期运行。当加热的马尔可夫链温度较低时,精度似乎是最好的,而高温的链似乎很少采样具有高后验概率的树木。

著录项

  • 来源
    《Systematic Biology》 |2006年第4期|553-565|共13页
  • 作者单位

    ARC Centre in Bioinformatics and Institute for Molecular Bioscience The University of Queensland Brisbane Queensland 4072 Australia E-mail: r.beiko{at}gmail.com (R.G.B.) and ARC Centre in Bioinformatics;

    Department of Mathematics The University of Queensland Brisbane Australia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号