...
首页> 外文期刊>BMC Bioinformatics >GRASP2: fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data
【24h】

GRASP2: fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data

机译:GRAPP2:快速和内存高效的基因中心装配和同性恋搜索组件测序数据

获取原文
           

摘要

A crucial task in metagenomic analysis is to annotate the function and taxonomy of the sequencing reads generated from a microbiome sample. In general, the reads can either be assembled into contigs and searched against reference databases, or individually searched without assembly. The first approach may suffer from fragmentary and incomplete assembly, while the second is hampered by the reduced functional signal contained in the short reads. To tackle these issues, we have previously developed GRASP (Guided Reference-based Assembly of Short Peptides), which accepts a reference protein sequence as input and aims to assemble its homologs from a database containing fragmentary protein sequences. In addition to a gene-centric assembly tool, GRASP also serves as a homolog search tool when using the assembled protein sequences as templates to recruit reads. GRASP has significantly improved recall rate (60-80% vs. 30-40%) compared to other homolog search tools such as BLAST. However, GRASP is both time- and space-consuming. Subsequently, we developed GRASPx, which is 30X faster than GRASP. Here, we present a completely redesigned algorithm, GRASP2, for this computational problem. GRASP2 utilizes Burrows-Wheeler Transformation (BWT) and FM-index to perform assembly graph generation, and reduces the search space by employing a fast ungapped alignment strategy as a filter. GRASP2 also explicitly generates candidate paths prior to alignment, which effectively uncouples the iterative access of the assembly graph and alignment matrix. This strategy makes the execution of the program more efficient under current computer architecture, and contributes to GRASP2's speedup. GRASP2 is 8-fold faster than GRASPx (and 250-fold faster than GRASP) and uses 8-fold less memory while maintaining the original high recall rate of GRASP. GRASP2 reaches ~?80% recall rate compared to that of ~?40% generated by BLAST, both at a high precision level (?95%). With such a high performance, GRASP2 is only ~3X slower than BLASTP. GRASP2 is a high-performance gene-centric and homolog search tool with significant speedup compared to its predecessors, which makes GRASP2 a useful tool for metagenomics data analysis, GRASP2 is implemented in C++ and is freely available from http://www.sourceforge.net/projects/grasp2 .
机译:雌噬菌素分析中的一个关键任务是向微生物组样品产生的测序读数的功能和分类。通常,读取可以组装到CONDIG中并搜索参考数据库,或者在没有组装的情况下单独搜索。第一方法可能遭受局部和不完全的组装,而第二个方法被短读取中所含的功能信号的减少的功能信号受到阻碍。为了解决这些问题,我们之前已经开发了掌握(基于引导的短肽的组装),其接受参考蛋白质序列作为输入,旨在通过含有碎片蛋白序列的数据库组装其同源物。除了以基因为中心的装配工具之外,掌握还用作同源物搜索工具,当使用组装的蛋白质序列作为招聘读取的模板。与其他同性恋搜索工具如爆炸相比,掌握显着提高了召回率(60-80%与30-40%)。但是,掌握既是时间和空间。随后,我们开发了Graspx,它比掌握快30倍。在这里,我们介绍了一个完全重新设计的算法GRASP2,用于该计算问题。 Grasp2利用挖掘机轮转器变换(BWT)和FM-索引来执行装配图生成,并通过使用快速未拍摄的对齐策略作为过滤器来减少搜索空间。 Grasp2还在对准之前明确地生成候选路径,这有效地解除了组装图和对准矩阵的迭代访问。此策略在当前计算机体系结构下更有效地执行程序,并有助于Grasp2的加速。 GRAPP2比GRASPX(比掌握快250倍)快8倍,并使用8倍的内存,同时保持原始的高召回掌握速率。克拉斯2达到〜80%的召回率,而〜40%的爆炸率为高精度(> 95%)。具有如此高的性能,Grasp2仅比BLOGKP慢〜3倍。 Grasp2是一种​​高性能基因中心和同性恋者搜索工具,与其前辈相比具有显着加速,其使GRAMP2成为Metagenomics数据分析的有用工具,Grasp2在C ++中实现,可从http://www.sourceforge自由提供。净/项目/ GRASP2。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号