...
首页> 外文期刊>Algorithms for Molecular Biology >Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
【24h】

Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

机译:使用Robinson-Foulds距离从不一致的多拷贝基因树推断物种树

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/ webcite ) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as well as phylogenetic error may contribute to gene tree discord. In experiments, the MulRF method estimated species trees accurately and quickly, demonstrating MulRF as an efficient alternative approach for phylogenetic inference from large-scale genomic data sets.
机译:背景技术从多拷贝基因树构建物种树仍然是系统发生学中一个具有挑战性的问题。一个困难是基础基因可能由于进化过程(例如基因复制和丢失,深度结合或横向基因转移)而不一致。基因树估计错误可能会进一步加剧物种树估计的难度。结果我们提出了一种新的方法,该方法从不相容的多拷贝基因树中推断树种,该方法基于Robinson-Foulds(RF)距离度量到多标记树(多树)的泛化。我们证明计算两个多树之间的RF距离是NP难的;但是,很容易计算多树和单标签物种树之间的距离。因此,我们将mul-tree(MulRF)的RF问题公式化如下:给定多拷贝基因树的集合,找到一个单标签物种树,该树将与输入mul-tree的总RF距离最小化。我们为NP难题的MulRF问题开发并实现了一种基于SPR的快速启发式算法。我们将MulRF方法(可从http://genome.cs.iastate.edu/CBL/MulRF/ webcite获得)的性能与几种基因树简约方法进行了比较,这些方法使用了融合了基因树错误,基因重复和丢失,和/或横向转移。与基因树简约方法相比,MulRF方法产生的树种更准确。我们还证明,MulRF方法可在几分钟内从近2,000个基因树的集合中推断出可靠的植物物种树。结论我们基于广义RF距离的新系统发育推断方法使从大型基因组数据集中快速估算物种树成为可能。由于与基因树简约性不同,MulRF方法是基于通用树距离度量的,因此它吸引了对基因组数据集的分析,其中许多过程(例如深聚结,重组,基因重复和损失以及系统发生错误)导致基因树不和谐。在实验中,MulRF方法可以准确,快速地估计物种树,这表明MulRF是从大规模基因组数据集进行系统发育推断的有效替代方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号