...
首页> 外文期刊>Systematic Biology >SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees
【24h】

SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees

机译:SATé-II:多重序列比对和系统发生树的非常快速且准确的同时估计

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561–1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes–Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.
机译:很难对大型数据集进行系统发育树的高精度估计,部分原因是多个序列比对必须很精确才能使系统发育估计方法更加准确。已尝试对路线和树木进行协估计,但目前只有SATé会在实际时间范围内估算大型数据集的合理准确的树木和路线(Liu K.,Raghavan S.,Nelesen S.,Linder CR,Warnow T. 2009b。快速而准确序列比对和系统发育树的大规模共估计(科学324:1561-1516)。在这里,我们对原始SATé算法进行了修改,在速度,系统发育和对齐精度方面对SATé(现在称为SATé-I)进行了改进。 SATé-II使用与SATé-I不同的分而治之策略,因此产生比SATé-I更紧密相关的子集。结果,SATé-II可以比SATé-I产生更准确的路线和树木,可以分析更大的数据集,并且运行效率更高。通常,SATé是一种将现有的多序列比对方法作为输入参数并提高该比对方法质量的元方法。 SATé-II增强的比对方法比其未增强的版本更加准确,基于这些改进的比对的树比基于原始比对的树更准确。由于SATé-I使用最大似然(ML)方法将间隙作为缺失数据来估计树木,并且因为我们发现了树木/路线对的质量与ML得分之间的相关性,因此我们探索了SATé的性能在多大程度上取决于使用ML将间隙视为缺失数据,以确定最佳的树/路线对。我们提供了两行证据,即使用带有间隙的ML作为缺失数据来优化路线和树,结果却很差。首先,我们证明了在给定一组未比对的DNA序列,输出为树的优化问题上,在Jukes-Cantor模型下最大化似然性的那些序列的比对在最坏的可能意义上是无意义的。对于所有输入,所有树都会优化似然度得分。其次,我们证明了使用GTR + Gamma ML优化对齐方式的贪婪启发式方法,该树可能会产生非常差的对齐方式和树状图。因此,SATé-II和SATé-I的出色性能并不是因为ML被用作选择最佳树/对齐对的优化标准,而是由于采用了特定的分而治之重新对齐技术。

著录项

  • 来源
    《Systematic Biology》 |2012年第1期|p.90-106|共17页
  • 作者

    C. Randal Linder;

  • 作者单位

    The University of Texas at Austin, Austin, TX 78712, USA;

    E-mail:;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号