首页> 美国卫生研究院文献>Bioinformatics >Significant speedup of database searches with HMMs by search space reduction with PSSM family models
【2h】

Significant speedup of database searches with HMMs by search space reduction with PSSM family models

机译:通过使用PSSM系列模型减少搜索空间大大提高了HMM的数据库搜索速度

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive.>Results: We propose a new method for efficient protein family classification and for speeding up database searches with pHMMs as is necessary for large-scale analysis scenarios. We employ simpler models of protein families called position-specific scoring matrices family models (PSSM-FMs). For fast database search, we combine full-text indexing, efficient exact p-value computation of PSSM match scores and fast fragment chaining. The resulting method is well suited to prefilter the set of sequences to be searched for subsequent database searches with pHMMs. We achieved a classification performance only marginally inferior to hmmsearch, yet, results could be obtained in a fraction of runtime with a speedup of >64-fold. In experiments addressing the method's ability to prefilter the sequence space for subsequent database searches with pHMMs, our method reduces the number of sequences to be searched with hmmsearch to only 0.80% of all sequences. The filter is very fast and leads to a total speedup of factor 43 over the unfiltered search, while retaining >99.5% of the original results. In a lossless filter setup for hmmsearch on UniProtKB/Swiss-Prot, we observed a speedup of factor 92.>Availability: The presented algorithms are implemented in the program PoSSuMsearch2, available for download at .>Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机::简介隐马尔可夫模型(pHMM)是目前最流行的蛋白质家族建模概念。它们提供敏感的家族描述子,用pHMMs进行序列数据库搜索已成为当今基因组注释流程中的标准任务。不利的一面是,使用pHMM进行搜索的计算量很大。我们采用称为位置特定评分矩阵族模型(PSSM-FMs)的蛋白家族的简单模型。对于快速数据库搜索,我们将全文索引,PSSM匹配分数的高效精确p值计算和快速片段链接相结合。所得方法非常适合于预过滤要搜索的序列集,以用于随后使用pHMM进行数据库搜索。我们仅实现了仅次于hmmsearch的分类性能,但是,可以在运行时的一小部分中获得结果,并且加速> 64倍。在针对该方法为pHMM进行后续数据库搜索而预先过滤序列空间的能力的实验中,我们的方法将使用hmmsearch搜索的序列数量减少到所有序列的0.80%。筛选器非常快,在未筛选的搜索结果中,总速度提高了43倍,同时保留了原始结果的> 99.5%。在UniProtKB / Swiss-Prot上用于hmmsearch的无损滤波器设置中,我们观察到速度提高了92倍。>可用性:提出的算法在程序PoSSuMsearch2中实现,可以从以下位置下载。> : >补充信息:可从在线生物信息学获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号