首页> 外文会议>Bioinformatics Research and Applications >Estimating Pairwise Statistical Significance ofProtein Local Alignments Using a Clustering-Classification Approach Based on Amino Acid Composition
【24h】

Estimating Pairwise Statistical Significance ofProtein Local Alignments Using a Clustering-Classification Approach Based on Amino Acid Composition

机译:使用基于氨基酸组成的聚类分类方法估计蛋白质局部比对的成对统计意义

获取原文
获取原文并翻译 | 示例

摘要

A central question in pairwise sequence comparison is assessing the statistical significance of the alignment. The alignment score distribution is known to follow an extreme value distribution with analytically calculable parameters K and λ for ungapped alignments with one substitution matrix. But no statistical theory is currently available for the gapped case and for alignments using multiple scoring matrices, although their score distribution is known to closely follow extreme value distribution and the corresponding parameters can be estimated by simulation. Ideal estimation would require simulation for each sequence pair, which is impractical. In this paper, we present a simple clustering-classification approach based on amino acid composition to estimate K and λ for a given sequence pair and scoring scheme, including using multiple parameter sets. The resulting set of K and λ for different cluster pairs has large variability even for the same scoring scheme, underscoring the heavy dependence of K and λ on the amino acid composition. The proposed approach in this paper is an attempt to separate the influence of amino acid composition in estimation of statistical significance of pair-wise protein alignments. Experiments and analysis of other approaches to estimate statistical parameters also indicate that the methods used in this work estimate the statistical significance with good accuracy.
机译:成对序列比较中的一个中心问题是评估比对的统计学意义。已知对准分数分布遵循具有可解析计算的参数K和λ的极值分布,用于具有一个替代矩阵的空位对准。但是,目前尚无统计理论可用于空缺情况和使用多个评分矩阵的比对,尽管已知它们的得分分布紧密遵循极值分布,并且可以通过仿真估算相应的参数。理想的估计将需要对每个序列对进行仿真,这是不切实际的。在本文中,我们提出了一种基于氨基酸组成的简单聚类分类方法,以估计给定序列对和评分方案的K和λ,包括使用多个参数集。即使对于相同的计分方案,对于不同簇对的所得K和λ集也具有较大的可变性,这强调了K和λ对氨基酸组成的严重依赖性。本文提出的方法是尝试分离氨基酸组成对估计成对蛋白质比对的统计显着性的影响。对估计统计参数的其他方法进行的实验和分析也表明,本工作中使用的方法可以很好地估计统计意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号