首页> 外文期刊>Systematic Biology >Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets
【24h】

Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets

机译:与蛋白质数据集上的其他对准方法相比,评估统计多序列对齐

获取原文
获取原文并翻译 | 示例
       

摘要

The estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of suchmethods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical coestimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy has better precision and recall (with respect to the true alignments) than the other alignment methods on the simulated data sets but has consistently lower recall on the biological benchmarks (with respect to the reference alignments) than many of the other methods. In otherwords, we find that BAli-Phy systematically underaligns when operating on biological sequence data but shows no sign of this on simulated data. There are several potential causes for this change in performance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments, and future research is needed to determine the most likely explanation. We conclude with a discussion of the potential ramifications for each of these possibilities.
机译:蛋白质序列的多个序列比对的估计是许多生物信息学管道中的基本步骤,包括序列演化的随机模型下的对准和树长期以来一直被认为是估计对齐和树木的最严格的技术,但是关于估算对齐和树木的最严格的技术,但是关于准确性的众所周知这样的生物基准测试。我们报告了评估最受欢迎的蛋白质对准方法的广泛研究结果以及从已建立的基准以及120个模拟数据集的1192个蛋白质数据集上的统计结束方法Bali-PHY。我们的研究(仅使用230多年的Bali-Phy分析)表明Bali-Phy比模拟数据集上的其他对准方法更好地精确并召回(关于真实对齐),而是始终如一地降低召回生物基准(关于参考对齐)的许多其他方法。在其他遍及时,我们发现在生物序列数据上运行时系统地发现了Bali-Phy,但在模拟数据上没有显示此符号。这种性能变化有几个潜在的原因,包括模型拼写,参考对齐中的错误,以及结构对齐和进化对齐之间的冲突,以及需要将来的研究确定最可能的解释。我们结束了讨论这些可能性中的每种可能性的潜在影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号