...
首页> 外文期刊>Molecular Biology and Evolution >Rapid Likelihood Analysis on Large Phylogenies Using Partial Sampling of Substitution Histories
【24h】

Rapid Likelihood Analysis on Large Phylogenies Using Partial Sampling of Substitution Histories

机译:使用替代历史的部分采样对大型系统发育学进行快速似然分析

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Likelihood-based approaches can reconstruct evolutionary processes in greater detail and with better precision from larger data sets. The extremely large comparative genomic data sets that are now being generated thus create new opportunities for understanding molecular evolution, but analysis of such large quantities of data poses escalating computational challenges. Recently developed Markov chain Monte Carlo methods that augment substitution histories are a promising approach to alleviate these computational costs. We analyzed the computational costs of several such approaches, considering how they scale with model and data set complexity. This provided a theoretical framework to understand the most important computational bottlenecks, leading us to combine novel variations of our conditional pathway integration approach with recent advances made by others. The resulting technique (“partial sampling” of substitution histories) is considerably faster than all other approaches we considered. It is accurate, simple to implement, and scales exceptionally well with dimensions of model complexity and data set size. In particular, the time complexity of sampling unobserved substitution histories using the new method is much faster than previously existing methods, and model parameter and branch length updates are independent of data set size. We compared the performance of methods on a 224-taxon set of mammalian cytochrome-b sequences. For a simple nucleotide substitution model, partial sampling was at least 10 times faster than the PhyloBayes program, which samples substitutions in continuous time, and about 100 times faster than when using fully integrated substitution histories. Under a general reversible model of amino acid substitution, the partial sampling method was 1,600 times faster than when using fully integrated substitution histories, confirming significantly improved scaling with model state-space complexity. Partial sampling of substitutions thus dramatically improves the utility of likelihood approaches for analyzing complex evolutionary processes on large data sets.
机译:基于可能性的方法可以从较大的数据集中更详细地并以更高的精度重建进化过程。现在正在生成的非常大的比较基因组数据集为理解分子进化提供了新的机会,但是对如此大量数据的分析带来了不断升级的计算挑战。最近开发的马尔可夫链蒙特卡罗方法可以增加替代历史,是减轻这些计算成本的一种有前途的方法。考虑到它们如何随着模型和数据集的复杂性扩展,我们分析了几种此类方法的计算成本。这为理解最重要的计算瓶颈提供了理论框架,使我们将条件路径集成方法的新颖变体与其他人的最新进展相结合。由此产生的技术(替代历史的“部分采样”)比我们考虑的所有其他方法都快得多。它准确,易于实施,并且在模型复杂度和数据集大小方面具有极好的伸缩性。特别是,使用新方法对未观察到的替换历史进行采样的时间复杂度比以前的现有方法快得多,并且模型参数和分支长度更新与数据集大小无关。我们比较了哺乳动物细胞色素b序列的224个分类单元上方法的性能。对于一个简单的核苷酸替代模型,部分采样至少要比PhyloBayes程序快10倍,该程序可以连续不断地对替代进行采样,比使用完全整合的替代历史记录时要快100倍左右。在通用的氨基酸替代可逆模型下,部分采样方法比使用完全整合的替代历史记录要快1600倍,这证实了模型状态空间复杂性显着改善了缩放比例。因此,对替代物的部分采样极大地提高了可能性方法在大数据集上分析复杂进化过程的实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号