首页> 外文期刊>BMC Bioinformatics >TOPAZ: asymmetric suffix array neighbourhood search for massive protein databases
【24h】

TOPAZ: asymmetric suffix array neighbourhood search for massive protein databases

机译:TOPAZ:大量蛋白质数据库的不对称后缀数组邻域搜索

获取原文
获取外文期刊封面目录资料

摘要

Protein homology search is an important, yet time-consuming, step in everything from protein annotation to metagenomics. Its application, however, has become increasingly challenging, due to the exponential growth of protein databases. In order to perform homology search at the required scale, many methods have been proposed as alternatives to BLAST that make an explicit trade-off between sensitivity and speed. One such method, SANSparallel, uses a parallel implementation of the suffix array neighbourhood search (SANS) technique to achieve high speed and provides several modes to allow for greater sensitivity at the expense of performance. We present a new approach called asymmetric SANS together with scored seeds and an alternative suffix array ordering scheme called optimal substitution ordering. These techniques dramatically improve both the sensitivity and speed of the SANS approach. Our implementation, TOPAZ, is one of the top performing methods in terms of speed, sensitivity and scalability. In our benchmark, searching UniProtKB for homologous proteins to the Dickeya solani proteome, TOPAZ took less than 3 minutes to achieve a sensitivity of 0.84 compared to BLAST. Despite the trade-off homology search methods have to make between sensitivity and speed, TOPAZ stands out as one of the most sensitive and highest performance methods currently available.
机译:从蛋白质注释到宏基因组学,蛋白质同源性搜索是重要但耗时的步骤。然而,由于蛋白质数据库的指数增长,其应用变得越来越具有挑战性。为了在所需规模​​上进行同源搜索,已提出了许多方法作为BLAST的替代方法,这些方法在灵敏度和速度之间做出了明确的权衡。一种这样的方法,即SANSparallel,使用后缀数组邻域搜索(SANS)技术的并行实现来实现高速,并提供几种模式以牺牲性能为代价实现更高的灵敏度。我们提出了一种新方法,称为非对称SANS以及计分种子,以及另一种后缀数组排序方案,称为最佳替换排序。这些技术极大地提高了SANS方法的灵敏度和速度。就速度,灵敏度和可伸缩性而言,我们的实现方式TOPAZ是性能最高的方法之一。在我们的基准测试中,用UniProtKB搜索与Dickeya solani蛋白质组有关的同源蛋白,TOPAZ用不到3分钟的时间即可达到BLAST的0.84灵敏度。尽管必须在灵敏度和速度之间做出折衷的同源性搜索方法,但TOPAZ仍是目前可用的最敏感和性能最高的方法之一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号