...
首页> 外文期刊>BMC Bioinformatics >A scalable method for identifying frequent subtrees in sets of large phylogenetic trees
【24h】

A scalable method for identifying frequent subtrees in sets of large phylogenetic trees

机译:用于识别大型系统树的频繁子树的可扩展方法

获取原文
           

摘要

Background We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. Results We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods. Conclusions Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.
机译:背景我们考虑在系统发育树的集合中找到最大频繁同意子树(MFAST)的问题。解决此问题的现有方法通常不会扩展到具有大约100个分类单元的数据集。我们的目标是解决具有超过一千个分类单元和数百棵树的数据集的问题。结果我们开发了一种启发式解决方案,旨在在许多大型系统树中找到MFAST。我们的方法分为多个阶段。在第一阶段,它从输入树的集合中识别出小的候选子树,这些子树充当较大子树的种子。在第二阶段,它将这些小种子结合起来以构建更大的候选MFAST。在最后阶段,它执行一个后处理步骤,以确保我们找到不在较大的频繁协议子树中的频繁协议子树。我们证明了这种启发式方法可以轻松处理具有1000个分类单元的数据集,从而大大扩展了MFAST的估计范围,超越了当前方法。结论尽管这种启发式方法不能保证找到所有MFAST或最大的MFAST,但它在我们可以验证结果正确性的所有综合数据集中都找到了MFAST。它在大型经验数据集上也表现良好。其性能对于输入树的数量和大小具有鲁棒性。总体而言,此方法提供了一种简单且快速的方法来识别大型系统发生假设中受强烈支持的子树。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号