A scalable method for identifying frequent subtrees in sets of large phylogenetic trees

Avinash Ramu; Tamer Kahveci; J Gordon Burleigh

首页> 外文期刊>BMC Bioinformatics >A scalable method for identifying frequent subtrees in sets of large phylogenetic trees

【24h】

A scalable method for identifying frequent subtrees in sets of large phylogenetic trees

机译：用于识别大型系统树的频繁子树的可扩展方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. Results We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods. Conclusions Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.

机译：背景我们考虑在系统发育树的集合中找到最大频繁同意子树（MFAST）的问题。解决此问题的现有方法通常不会扩展到具有大约100个分类单元的数据集。我们的目标是解决具有超过一千个分类单元和数百棵树的数据集的问题。结果我们开发了一种启发式解决方案，旨在在许多大型系统树中找到MFAST。我们的方法分为多个阶段。在第一阶段，它从输入树的集合中识别出小的候选子树，这些子树充当较大子树的种子。在第二阶段，它将这些小种子结合起来以构建更大的候选MFAST。在最后阶段，它执行一个后处理步骤，以确保我们找到不在较大的频繁协议子树中的频繁协议子树。我们证明了这种启发式方法可以轻松处理具有1000个分类单元的数据集，从而大大扩展了MFAST的估计范围，超越了当前方法。结论尽管这种启发式方法不能保证找到所有MFAST或最大的MFAST，但它在我们可以验证结果正确性的所有综合数据集中都找到了MFAST。它在大型经验数据集上也表现良好。其性能对于输入树的数量和大小具有鲁棒性。总体而言，此方法提供了一种简单且快速的方法来识别大型系统发生假设中受强烈支持的子树。

著录项

来源
《BMC Bioinformatics》 |2012年第1期|共页
作者
Avinash Ramu; Tamer Kahveci; J Gordon Burleigh;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. A scalable method for identifying frequent subtrees in sets of large phylogenetic trees [J] . Avinash Ramu, Tamer Kahveci, J Gordon Burleigh BMC Bioinformatics . 2012,第1期

机译：用于识别大型系统树的频繁子树的可扩展方法
2. Enumerating all maximal frequent subtrees in collections of phylogenetic trees [J] . Akshay Deepak, David Fernández-Baca Algorithms for Molecular Biology . 2014,第1期

机译：枚举系统发育树集合中的所有最大频繁子树
3. The application of gene tree-based phylogenetic methods to primate morphological data sets [J] . Steiper Michael E., Guevara Elaine E., Pugh Kelsey D. American Journal of Physical Anthropology . 2016,第Suppla62期

机译：基于基因树的系统发育方法在灵长类动物形态数据集中的应用
4. Phylogenetic Trees Dissimilarity Measure Based on Strict Frequent Splits Set and Its Applicationfor Clustering [C] . Jakub Koperwas, Krzysztof Walczak Rough Sets and Knowledge Technology . 2008

机译：基于严格频繁分裂集的系统进化树差异度量及其在聚类中的应用
5. Improving the Scalability of an Exact Approach for Frequent Item Set Hiding [D] . LaMacchia, Carolyn 2013

机译：提高频繁项目集隐藏的精确方法的可伸缩性
6. A scalable method for identifying frequent subtrees in sets of large phylogenetic trees [O] . Avinash Ramu, Tamer Kahveci, J Gordon Burleigh 2012

机译：用于识别大型系统树的频繁子树的可扩展方法
7. A scalable method for identifying frequent subtrees in sets of large phylogenetic trees [O] . Ramu, Avinash, Kahveci, Tamer, Burleigh, J Gordon 2012

机译：用于识别大型系统树的频繁子树的可扩展方法

A scalable method for identifying frequent subtrees in sets of large phylogenetic trees

摘要

著录项

相似文献

相关主题

期刊订阅