Fast probabilistic analysis of sequence function using scoring matrices.

Wu TD; Nevill Manning-CG; Brutlag DL

首页> 外文期刊>Bioinformatics >Fast probabilistic analysis of sequence function using scoring matrices.

【24h】

Fast probabilistic analysis of sequence function using scoring matrices.

机译：使用评分矩阵对序列函数进行快速概率分析。

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

MOTIVATION: We present techniques for increasing the speed of sequence analysis using scoring matrices. Our techniques are based on calculating, for a given scoring matrix, the quantile function, which assigns a probability, or p, value to each segmental score. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow scoring matrices to be used more widely in large-scale sequencing and annotation projects. RESULTS: We develop three techniques for increasing the speed of sequence analysis: probability filtering, lookahead scoring, and permuted lookahead scoring. In probability filtering, we compute the score threshold that corresponds to the user-specified p threshold. We use the score threshold to limit the number of segments that are retained in the search process. In lookahead scoring, we test intermediate scores to determine whether they will possibly exceed the score threshold. In permuted lookahead scoring, we score each segment in a particular order designed to maximize the likelihood of early termination. Our two lookahead scoring techniques reduce substantially the number of residues that must be examined. The fraction of residues examined ranges from 62 to 6%, depending on the p threshold chosen by the user. These techniques permit sequence analysis with scoring matrices at speeds that are several times faster than existing programs. On a database of 12 177 alignment blocks, our techniques permit sequence analysis at a speed of 225 residues/s for a p threshold of 10-6, and 541 residues/s for a p threshold of 10-20. In order to compute the quantile function, we may use either an independence assumption or a Markov assumption. We measure the effect of first- and second-order Markov assumptions and find that they tend to raise the p value of segments, when compared with the independence assumption, by average ratios of 1.30 and 1.69, respectively. We also compare our technique with the empirical 99. 5th percentile scores compiled in the BLOCKSPLUS database, and find that they correspond on average to a p value of 1.5 x 10-5. AVAILABILITY: The techniques described above are implemented in a software package called EMATRIX. This package is available from the authors for free academic use or for licensed commercial use. The EMATRIX set of programs is also available on the Internet at http://motif.stanford.edu/ematrix.

机译：动机：我们介绍了使用评分矩阵提高序列分析速度的技术。我们的技术基于为给定的得分矩阵计算分位数函数，该分位数函数将概率或p值分配给每个细分得分。我们的技术还允许用户指定p阈值，以指示特定序列分析的灵敏度和速度之间的理想平衡。速度的提高将使评分矩阵在大规模测序和注释项目中得到更广泛的使用。结果：我们开发了三种技术来提高序列分析的速度：概率过滤，先行评分和置换先行评分。在概率过滤中，我们计算与用户指定的p阈值相对应的得分阈值。我们使用分数阈值来限制在搜索过程中保留的细分数量。在前瞻评分中，我们测试中间分数，以确定它们是否可能会超过分数阈值。在排列前瞻性评分中，我们按照特定顺序对每个片段评分，旨在最大程度地提前终止。我们的两种前瞻性评分技术大大减少了必须检查的残基数量。根据用户选择的p阈值，所检查残留物的百分比范围为62％至6％。这些技术允许使用评分矩阵进行序列分析，其速度比现有程序快几倍。在12 177个比对区块的数据库中，我们的技术允许以225个残基/ s的速度对10阈值的p进行序列分析，并以541个残基/ s的p阈值10-20进行序列分析。为了计算分位数函数，我们可以使用独立性假设或马尔可夫假设。我们测量了一阶和二阶马尔可夫假设的影响，发现与独立性假设相比，它们倾向于分别以1.30和1.69的平均比率提高段的p值。我们还将我们的技术与在BLOCKSPLUS数据库中汇编的经验99. 5％百分值进行比较，发现它们平均对应于1.5 x 10-5的p值。可用性：上述技术在称为EMATRIX的软件包中实现。该软件包可从作者那里获得，以用于免费的学术用途或经许可的商业用途。 EMATRIX程序集也可以在Internet上找到，网址为http://motif.stanford.edu/ematrix。

著录项

来源
《Bioinformatics》 |2000年第3期|共12页
作者
Wu TD; Nevill Manning-CG; Brutlag DL;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物科学;生物工程学（生物技术）;
关键词
Sequence Analysis; Software; 序列分析; 软件;

机译：Sequence Analysis;Software;序列分析;软件;

相似文献

外文文献
中文文献
专利

1. Fast probabilistic analysis of sequence function using scoring matrices. [J] . Wu TD, Nevill Manning-CG, Brutlag DL Bioinformatics . 2000,第3期

机译：使用评分矩阵对序列函数进行快速概率分析。
2. Prediction of membrane protein types from sequences and position-specific scoring matrices. [J] . Pu X, Guo J, Leung H, Journal of Theoretical Biology . 2007,第2期

机译：从序列和特定位置的评分矩阵预测膜蛋白类型。
3. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. [J] . Wilson CA, Kreychman J, Gerstein M Journal of Molecular Biology . 2000,第1期

机译：评估基因组学的注释转移：通过传统和概率评分量化蛋白质序列，结构和功能之间的关系。
4. A Coupled Dimensional Decomposition and Score Function Method for Probabilistic Sensitivity Analysis [C] . S. Rahman AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference . 2009

机译：概率灵敏度分析的维数分解与得分函数耦合方法
5. Prediction of Protein Function with a Probabilistic Model for Analysis of Sequence Similarity Networks and Genomic Context [D] . Yunes, Jeffrey Michael. 2018

机译：利用概率模型预测蛋白质相似性网络和基因组背景的蛋白质功能
6. ProbFAST: Probabilistic Functional Analysis System Tool [O] . Israel T Silva, Ricardo ZN Vêncio, Thiago YK Oliveira, 2010

机译：ProbFAST：概率函数分析系统工具
7. Fast Probabilistic Analysis of Sequence Function Using Scoring Matrices [O] . Thomas D. Wu, Craig G. Nevill-Manning, Douglas L. Brutlag 2000

机译：使用评分矩阵对序列函数进行快速概率分析

Fast probabilistic analysis of sequence function using scoring matrices.

摘要

著录项

相似文献

相关主题

期刊订阅