首页> 美国卫生研究院文献>Nucleic Acids Research >Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
【2h】

Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction

机译:k元组距离与四个基于模型的距离在系统树重构中的性能比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast ‘guide tree’ to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes–Cantor, Kimura, F84 and Tamura–Nei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences.
机译:系统发生树重建需要从序列构建多序列比对(MSA)。计算上,很难为许多序列获得最佳的MSA。而且,即使获得了最佳的MSA,也可能不是反映基础序列进化历史的真正MSA。因此,在MSA构建期间会引入错误,进而影响后续的系统树的构建。为了解决这个问题,我们将k元组距离的应用扩展到系统树的重建中。两个序列之间的k元组距离是序列之间在长度k的所有可能元组上的频率差之和,无需MSA即可估算。传统上,它被用来构建快速的“指南树”以协助MSA的构建。使用在不同进化情况下生成的1470个模拟序列集,相邻连接树和BioNJ树,我们将k元组距离的性能与Jukes-Cantor,Kimura,F84和Tamura-Nei等四种常用距离估计器进行了比较。这四个距离估计器属于基于模型的距离估计器,因为它们每个都考虑了特定的替换模型,以便计算一对已经对齐的序列之间的距离。结果表明,从k元组距离构造的树比大多数时间从其他距离构造的树更准确;当基础序列之间的差异较大时,使用k元组距离的树精度可能是其他估计量的两倍或更高。此外,由于k元组的距离使构建MSA的需求省去了,因此当数据包含大量序列时,它可以节省大量的系统发育树重建时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号