首页> 外文会议>International Symposium on Bioinformatics Research and Applications >Pairwise Statistical Significance Versus Database Statistical Significance for Local Alignment of Protein Sequences
【24h】

Pairwise Statistical Significance Versus Database Statistical Significance for Local Alignment of Protein Sequences

机译:成对统计学显着性与蛋白质序列局部对准的数据库统计学意义

获取原文

摘要

An important aspect of pairwise sequence comparison is assessing the statistical significance of the alignment. Most of the currently popular alignment programs report the statistical significance of an alignment in context of a database search. This database statistical significance is dependent on the database, and hence, the same alignment of a pair of sequences may be assessed different statistical significance values in different databases. In this paper, we explore the use of pair-wise statistical significance, which is independent of any database, and can be useful in cases where we only have a pair of sequences and we want to comment on the relatedness of the sequences, independent of any database. We compared different methods and determined that censored maximum likelihood fitting the score distribution right of the peak is the most accurate method for estimating pairwise statistical significance. We evaluated this method in an experiment with a subset of CATH2.3, which had been previoulsy used by other authors as a benchmark data set for protein comparison. Comparison of results with database statistical significance reported by popular programs like SSEARCH and PSI-BLAST indicate that the results of pairwise statistical significance are comparable, indeed sometimes significantly better than those of database statistical significance (with SSEARCH). However, PSI-BLAST performs best, presumably due to its use of query-specific substitution matrices.
机译:成对序列比较的一个重要方面是评估对准的统计学意义。大多数目前流行的对齐程序报告了数据库搜索的上下文中对齐的统计显着性。该数据库统计学意义依赖于数据库,因此,可以在不同数据库中评估一对序列的相同对准。在本文中,我们探讨了对与任何数据库无关的配对统计显着性的使用,并且在我们只有一对序列的情况下可以有用,并且我们希望对序列的相关性进行评论,而独立于任何数据库。我们比较了不同的方法,并确定了审查的最大似然拟合峰值的最大似然符合峰值是估算成对统计显着性的最准确的方法。我们在具有Cath2.3的子集的实验中评估了该方法,该方法是其他作者使用的Previoulsy作为蛋白质比较的基准数据。通过SSEARCH和PSI-BLAST等流行节目报告的数据库统计显着性的结果表明,成对统计显着性的结果是可比的,但实际上有时比数据库统计显着性显着更好(SSearch)。然而,PSI-BLAST表现最佳,可能是由于其使用查询特定的替代矩阵。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号