首页> 外文会议>International Conference on Comparative Genomics >NJMerge: A Generic Technique for Scaling Phylogeny Estimation Methods and Its Application to Species Trees
【24h】

NJMerge: A Generic Technique for Scaling Phylogeny Estimation Methods and Its Application to Species Trees

机译:NJMERGE:一种用于缩放系统发育估算方法的通用技术及其在物种树木的应用

获取原文

摘要

Divide-and-conquer methods, which divide the species set into overlapping subsets, construct trees on the subsets, and then combine the trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of these approaches. In this paper, we present a new divide-and-conquer approach that does not require supertree estimation: we divide the species set into disjoint subsets, construct trees on the subsets, and then combine the trees using a distance matrix computed on the full species set. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of the Neighbor Joining algorithm. We report on the results of an extensive simulation study evaluating NJMerge's utility in scaling three popular species tree estimation methods: ASTRAL, SVDquartets, and concatenation analysis using RAxML. We find that NJMerge provides substantial improvements in running time without sacrificing accuracy and sometimes even improves accuracy. Furthermore, although NJMerge can sometimes fail to return a tree, the failure rate in our experiments is less than 1%. Together, these results suggest that NJMerge is a valuable technique for scaling computationally intensive methods to larger datasets, especially when computational resources are limited. NJMerge is freely available on Github: https:// github.com/ekmolloy/njmerge. All datasets, scripts, and supplementary materials are freely available through the Illinois Data Bank: https:// doi.org/10.13012/B2IDB-1424746_V1.
机译:分割和征服方法将设置成重叠的子集分为重叠子集的方法,在子集上构造树木,然后使用超级方法组合树木,提供一个关键算法框架,用于提高系统发育估计方法的可扩展性到大数据集。然而,使用卓越方法,通常试图解决NP-Hard优化问题,限制了这些方法的可扩展性。在本文中,我们提出了一种不需要Supertree估计的新的鸿沟和征服方法:我们将设置的物种划分为脱编子集,构造在子集上的树木,然后使用在完整物种上计算的距离矩阵结合树放。对于此合并步骤,我们提出了一种名为NJMerge的新方法,它是邻居加入算法的多项式扩展。我们报告了大量仿真研究的结果,评估NJMERGE在缩放三种流行的物种树估计方法中的效用:Astral,SVDQuartets和使用RaxML的连接分析。我们发现Njmerge在不牺牲准确度的情况下在运行时间提供大量改进,有时甚至提高准确性。此外,尽管NJMerge有时无法返回树,但我们的实验中的失败率小于1%。这些结果表明,NJMerge是一种有价值的技术,用于将计算密集型方法缩放到更大的数据集,尤其是当计算资源有限时。 NJMerge在Github上自由提供:HTTPS:// Github.com/ekmolloy/njmmerge。所有数据集,脚本和补充材料可通过伊利诺伊州数据库自由提供:HTTPS:// Doi.org/10.13012/B2IDB-1424746_V1。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号