首页> 外文期刊>Systematic Biology >SDM: A Fast Distance-Based Approach for (Super)Tree Building in Phylogenomics
【24h】

SDM: A Fast Distance-Based Approach for (Super)Tree Building in Phylogenomics

机译:SDM:一种基于快速距离的(超级)树构建系统学方法

获取原文
获取原文并翻译 | 示例
           

摘要

Phylogenomic studies aim to build phylogenies from large sets of homologous genes. Such “genome-sized” data require fast methods, because of the typically large numbers of taxa examined. In this framework, distance-based methods are useful for exploratory studies and building a starting tree to be refined by a more powerful maximum likelihood (ML) approach. However, estimating evolutionary distances directly from concatenated genes gives poor topological signal as genes evolve at different rates. We propose a novel method, named super distance matrix (SDM), which follows the same line as average consensus supertree (ACS; Lapointe and Cucumel, 1997) and combines the evolutionary distances obtained from each gene into a single distance supermatrix to be analyzed using a standard distance-based algorithm. SDM deforms the source matrices, without modifying their topological message, to bring them as close as possible to each other; these deformed matrices are then averaged to obtain the distance supermatrix. We show that this problem is equivalent to the minimization of a least-squares criterion subject to linear constraints. This problem has a unique solution which is obtained by resolving a linear system. As this system is sparse, its practical resolution requires O(na ka) time, where n is the number of taxa, k the number of matrices, and a < 2, which allows the distance supermatrix to be quickly obtained. Several uses of SDM are proposed, from fast exploratory studies to more accurate approaches requiring heavier computing time. Using simulations, we show that SDM is a relevant alternative to the standard matrix representation with parsimony (MRP) method, notably when the taxa sets of the different genes have low overlap. We also show that SDM can be used to build an excellent starting tree for an ML approach, which both reduces the computing time and increases the topogical accuracy. We use SDM to analyze the data set of Gatesy et al. (2002, Syst. Biol. 51: 652–664) that involves 48 genes of 75 placental mammals. The results indicate that these genes have strong rate heterogeneity and confirm the simulation conclusions.
机译:系统生物学研究旨在从大量同源基因中建立系统发育。由于通常要检查大量的分类单元,因此此类“基因组大小”的数据需要快速的方法。在此框架中,基于距离的方法可用于探索性研究和构建起始树,并通过更强大的最大似然(ML)方法加以完善。但是,直接估计级联基因的进化距离会产生不良的拓扑信号,因为基因以不同的速率进化。我们提出了一种称为超距离矩阵(SDM)的新方法,该方法与平均共识超树(ACS; Lapointe and Cucumel,1997)遵循同一条线,并将从每个基因获得的进化距离组合为一个距离超矩阵,以使用一种基于距离的标准算法。 SDM使源矩阵变形,而不修改其拓扑消息,以使它们彼此尽可能接近。然后,对这些变形的矩阵求平均,以获得距离超矩阵。我们证明这个问题等同于最小二乘准则在线性约束下的最小化。这个问题有一个独特的解决方案,它是通过解决线性系统而获得的。由于该系统稀疏,因此其实际解析度需要O(n a k a )时间,其中n是分类单元数,k是矩阵数,并且< 2,可以快速获得距离超矩阵。提出了SDM的几种用途,从快速的探索性研究到需要大量计算时间的更精确方法。通过仿真,我们显示SDM是用简约(MRP)方法替代标准矩阵表示的一种相关替代方法,尤其是当不同基因的分类单元集具有低重叠时。我们还表明,SDM可用于为ML方法构建出色的起始树,这不仅减少了计算时间,而且提高了拓扑准确性。我们使用SDM分析Gatesy等人的数据集。 (2002,Syst.Biol.51:652-664),涉及75个胎盘哺乳动物的48个基因。结果表明,这些基因具有很强的速率异质性,并证实了仿真结论。

著录项

  • 来源
    《Systematic Biology》 |2006年第5期|740-755|共16页
  • 作者单位

    Groupe Phylogénie Moléculaire ISEM Université Montpellier 2 CC 064 34095 Montpellier Cedex 05 France;

    Equipe Méthodes et Algorithmes pour la Bioinformatique LIRMM (CNRS Université Montpellier 2) 161 rue Ada 34392 Montpellier Cedex 05 France E-mail: gascuel{at}lirmm.fr (O.G.);

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号