首页> 外文期刊>Bioinformatics >Fast probabilistic analysis of sequence function using scoring matrices.
【24h】

Fast probabilistic analysis of sequence function using scoring matrices.

机译:使用评分矩阵对序列函数进行快速概率分析。

获取原文
获取原文并翻译 | 示例
       

摘要

MOTIVATION: We present techniques for increasing the speed of sequence analysis using scoring matrices. Our techniques are based on calculating, for a given scoring matrix, the quantile function, which assigns a probability, or p, value to each segmental score. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow scoring matrices to be used more widely in large-scale sequencing and annotation projects. RESULTS: We develop three techniques for increasing the speed of sequence analysis: probability filtering, lookahead scoring, and permuted lookahead scoring. In probability filtering, we compute the score threshold that corresponds to the user-specified p threshold. We use the score threshold to limit the number of segments that are retained in the search process. In lookahead scoring, we test intermediate scores to determine whether they will possibly exceed the score threshold. In permuted lookahead scoring, we score each segment in a particular order designed to maximize the likelihood of early termination. Our two lookahead scoring techniques reduce substantially the number of residues that must be examined. The fraction of residues examined ranges from 62 to 6%, depending on the p threshold chosen by the user. These techniques permit sequence analysis with scoring matrices at speeds that are several times faster than existing programs. On a database of 12 177 alignment blocks, our techniques permit sequence analysis at a speed of 225 residues/s for a p threshold of 10-6, and 541 residues/s for a p threshold of 10-20. In order to compute the quantile function, we may use either an independence assumption or a Markov assumption. We measure the effect of first- and second-order Markov assumptions and find that they tend to raise the p value of segments, when compared with the independence assumption, by average ratios of 1.30 and 1.69, respectively. We also compare our technique with the empirical 99. 5th percentile scores compiled in the BLOCKSPLUS database, and find that they correspond on average to a p value of 1.5 x 10-5. AVAILABILITY: The techniques described above are implemented in a software package called EMATRIX. This package is available from the authors for free academic use or for licensed commercial use. The EMATRIX set of programs is also available on the Internet at http://motif.stanford.edu/ematrix.
机译:动机:我们介绍了使用评分矩阵提高序列分析速度的技术。我们的技术基于为给定的得分矩阵计算分位数函数,该分位数函数将概率或p值分配给每个细分得分。我们的技术还允许用户指定p阈值,以指示特定序列分析的灵敏度和速度之间的理想平衡。速度的提高将使评分矩阵在大规模测序和注释项目中得到更广泛的使用。结果:我们开发了三种技术来提高序列分析的速度:概率过滤,先行评分和置换先行评分。在概率过滤中,我们计算与用户指定的p阈值相对应的得分阈值。我们使用分数阈值来限制在搜索过程中保留的细分数量。在前瞻评分中,我们测试中间分数,以确定它们是否可能会超过分数阈值。在排列前瞻性评分中,我们按照特定顺序对每个片段评分,旨在最大程度地提前终止。我们的两种前瞻性评分技术大大减少了必须检查的残基数量。根据用户选择的p阈值,所检查残留物的百分比范围为62%至6%。这些技术允许使用评分矩阵进行序列分析,其速度比现有程序快几倍。在12 177个比对区块的数据库中,我们的技术允许以225个残基/ s的速度对10阈值的p进行序列分析,并以541个残基/ s的p阈值10-20进行序列分析。为了计算分位数函数,我们可以使用独立性假设或马尔可夫假设。我们测量了一阶和二阶马尔可夫假设的影响,发现与独立性假设相比,它们倾向于分别以1.30和1.69的平均比率提高段的p值。我们还将我们的技术与在BLOCKSPLUS数据库中汇编的经验99. 5%百分值进行比较,发现它们平均对应于1.5 x 10-5的p值。可用性:上述技术在称为EMATRIX的软件包中实现。该软件包可从作者那里获得,以用于免费的学术用途或经许可的商业用途。 EMATRIX程序集也可以在Internet上找到,网址为http://motif.stanford.edu/ematrix。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号