首页> 外文期刊>BMC Genomics >MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence
【24h】

MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence

机译:MaxSSmap:一个GPU程序,用于将不同的短阅读映射到具有最大得分子序列的基因组

获取原文
           

摘要

Programs based on hash tables and Burrows-Wheeler are very fast for mapping short reads to genomes but have low accuracy in the presence of mismatches and gaps. Such reads can be aligned accurately with the Smith-Waterman algorithm but it can take hours and days to map millions of reads even for bacteria genomes. We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to most programs MaxSSmap identifies a local region of the genome followed by exact alignment. Instead of using hash tables or Burrows-Wheeler in the first part, MaxSSmap calculates maximum scoring subsequence score between the read and disjoint fragments of the genome in parallel on a GPU and selects the highest scoring fragment for exact alignment. We evaluate MaxSSmap’s accuracy and runtime when mapping simulated Illumina E.coli and human chromosome one reads of different lengths and 10% to 30% mismatches with gaps to the E.coli genome and human chromosome one. We also demonstrate applications on real data by mapping ancient horse DNA reads to modern genomes and unmapped paired reads from NA12878 in 1000 genomes. We show that MaxSSmap attains comparable high accuracy and low error to fast Smith-Waterman programs yet has much lower runtimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMap with high accuracy and low error much faster than if Smith-Waterman were used. On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have lower accuracy compared to at higher lengths. On real data MaxSSmap produces many alignments with high score and mapping quality that are not given by NextGenMap and BWA. The MaxSSmap source code in CUDA and OpenCL is freely available from http://www.cs.njit.edu/usman/MaxSSmap .
机译:基于哈希表和Burrows-Wheeler的程序将短读序列映射到基因组的速度非常快,但是在存在错配和缺口的情况下准确性较低。可以使用Smith-Waterman算法将这些读段精确对齐,但是即使细菌基因组也需要数小时和数天才能绘制数百万条读图。我们引入了一个名为MaxSSmap的GPU程序,旨在达到与Smith-Waterman相当的精度,但运行速度更快。与大多数程序相似,MaxSSmap可识别基因组的局部区域,然后进行精确比对。 MaxSSmap不是在第一部分中使用哈希表或Burrows-Wheeler,而是在GPU上并行计算基因组的读取片段和不相交片段之间的最大得分子序列得分,并选择最高得分片段进行精确比对。当我们绘制模拟的Illumina大肠杆菌和人类染色体1的读码长度不同,且与大肠杆菌基因组和人类染色体1的缺口有10%到30%不匹配时,我们会评估MaxSSmap的准确性和运行时间。我们还通过将古马DNA读图映射到现代基因组以及来自NA12878的1000个基因组中未映射的配对读图来证明在真实数据上的应用。我们证明,MaxSSmap具有与快速Smith-Waterman程序相当的高精度和低错误,但运行时间却低得多。我们证明,与使用Smith-Waterman相比,MaxSSmap可以以更高的准确性和低错误来映射BWA和NextGenMap拒绝的读取。与较长的读取长度相比,在36和51的较短读取长度上,MaxSSmap和Smith-Waterman的准确性较低。在真实数据上,MaxSSmap可以产生许多高分和地图质量的对齐方式,而NextGenMap和BWA则没有。 CUDA和OpenCL中的MaxSSmap源代码可从http://www.cs.njit.edu/usman/MaxSSmap免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号