首页> 外文会议>International workshop on comparative genomics >NJMerge: A Generic Technique for Scaling Phylogeny Estimation Methods and Its Application to Species Trees
【24h】

NJMerge: A Generic Technique for Scaling Phylogeny Estimation Methods and Its Application to Species Trees

机译:NJMerge:扩展系统发育估计方法的通用技术及其在树种中的应用

获取原文

摘要

Divide-and-conquer methods, which divide the species set into overlapping subsets, construct trees on the subsets, and then combine the trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of these approaches. In this paper, we present a new divide-and-conquer approach that does not require supertree estimation: we divide the species set into disjoint subsets, construct trees on the subsets, and then combine the trees using a distance matrix computed on the full species set. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of the Neighbor Joining algorithm. We report on the results of an extensive simulation study evaluating NJMerge's utility in scaling three popular species tree estimation methods: ASTRAL, SVDquartets, and concatenation analysis using RAxML. We find that NJMerge provides substantial improvements in running time without sacrificing accuracy and sometimes even improves accuracy. Furthermore, although NJMerge can sometimes fail to return a tree, the failure rate in our experiments is less than 1%. Together, these results suggest that NJMerge is a valuable technique for scaling computationally intensive methods to larger datasets, especially when computational resources are limited. NJMerge is freely available on Github: https:// github.com/ekmolloyjmerge. All datasets, scripts, and supplementary materials are freely available through the Illinois Data Bank: https:// doi.org/10.13012/B2IDB-1424746_V1.
机译:分而治之的方法将物种集合划分为重叠的子集,在子集上构建树,然后使用超树方法组合树,为提高系统发育估计方法对大型数据集的可扩展性提供了关键的算法框架。然而,通常试图解决NP硬优化问题的超树方法的使用限制了这些方法的可伸缩性。在本文中,我们提出了一种不需要超树估计的新的分治方法:将物种集合划分为不相交的子集,在子集上构造树,然后使用对整个物种计算的距离矩阵将树合并放。对于此合并步骤,我们提出了一种称为NJMerge的新方法,它是Neighbor Joining算法的多项式时间扩展。我们报告了一项广泛的模拟研究结果,该研究评估了NJMerge在扩展三种流行物种树估计方法中的效用:ASTRAL,SVDquartets和使用RAxML进行的级联分析。我们发现NJMerge可以在不牺牲准确性的情况下大大提高运行时间,有时甚至可以提高准确性。此外,尽管NJMerge有时可能无法返回树,但在我们的实验中,失败率不到1%。总之,这些结果表明,NJMerge是一种将计算密集型方法扩展到较大数据集的有价值的技术,尤其是在计算资源有限的情况下。 NJMerge在Github上免费提供:https://github.com/ekmolloy/njmerge。所有数据集,脚本和补充材料都可以通过伊利诺伊州数据库免费获得:https:// doi.org/10.13012/B2IDB-1424746_V1。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号