MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence

Turki Turki; Usman Roshan

首页> 外文期刊>BMC Genomics >MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence

【24h】

MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence

机译：MaxSSmap：一个GPU程序，用于将不同的短阅读映射到具有最大得分子序列的基因组

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Programs based on hash tables and Burrows-Wheeler are very fast for mapping short reads to genomes but have low accuracy in the presence of mismatches and gaps. Such reads can be aligned accurately with the Smith-Waterman algorithm but it can take hours and days to map millions of reads even for bacteria genomes. We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to most programs MaxSSmap identifies a local region of the genome followed by exact alignment. Instead of using hash tables or Burrows-Wheeler in the first part, MaxSSmap calculates maximum scoring subsequence score between the read and disjoint fragments of the genome in parallel on a GPU and selects the highest scoring fragment for exact alignment. We evaluate MaxSSmap’s accuracy and runtime when mapping simulated Illumina E.coli and human chromosome one reads of different lengths and 10% to 30% mismatches with gaps to the E.coli genome and human chromosome one. We also demonstrate applications on real data by mapping ancient horse DNA reads to modern genomes and unmapped paired reads from NA12878 in 1000 genomes. We show that MaxSSmap attains comparable high accuracy and low error to fast Smith-Waterman programs yet has much lower runtimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMap with high accuracy and low error much faster than if Smith-Waterman were used. On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have lower accuracy compared to at higher lengths. On real data MaxSSmap produces many alignments with high score and mapping quality that are not given by NextGenMap and BWA. The MaxSSmap source code in CUDA and OpenCL is freely available from http://www.cs.njit.edu/usman/MaxSSmap .

机译：基于哈希表和Burrows-Wheeler的程序将短读序列映射到基因组的速度非常快，但是在存在错配和缺口的情况下准确性较低。可以使用Smith-Waterman算法将这些读段精确对齐，但是即使细菌基因组也需要数小时和数天才能绘制数百万条读图。我们引入了一个名为MaxSSmap的GPU程序，旨在达到与Smith-Waterman相当的精度，但运行速度更快。与大多数程序相似，MaxSSmap可识别基因组的局部区域，然后进行精确比对。 MaxSSmap不是在第一部分中使用哈希表或Burrows-Wheeler，而是在GPU上并行计算基因组的读取片段和不相交片段之间的最大得分子序列得分，并选择最高得分片段进行精确比对。当我们绘制模拟的Illumina大肠杆菌和人类染色体1的读码长度不同，且与大肠杆菌基因组和人类染色体1的缺口有10％到30％不匹配时，我们会评估MaxSSmap的准确性和运行时间。我们还通过将古马DNA读图映射到现代基因组以及来自NA12878的1000个基因组中未映射的配对读图来证明在真实数据上的应用。我们证明，MaxSSmap具有与快速Smith-Waterman程序相当的高精度和低错误，但运行时间却低得多。我们证明，与使用Smith-Waterman相比，MaxSSmap可以以更高的准确性和低错误来映射BWA和NextGenMap拒绝的读取。与较长的读取长度相比，在36和51的较短读取长度上，MaxSSmap和Smith-Waterman的准确性较低。在真实数据上，MaxSSmap可以产生许多高分和地图质量的对齐方式，而NextGenMap和BWA则没有。 CUDA和OpenCL中的MaxSSmap源代码可从http://www.cs.njit.edu/usman/MaxSSmap免费获得。

著录项

来源
《BMC Genomics》 |2014年第1期|共页
作者
Turki Turki; Usman Roshan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医学遗传学;
关键词

相似文献

外文文献
中文文献
专利

1. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score [J] . Lee Hayan, Schatz Michael C. Bioinformatics . 2012,第16期

机译：基因组暗物质：由基因组可映射性得分说明的短读映射的可靠性
2. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score [J] . Hayan Lee12* and Michael C. Schatz12 Bioinformatics . 2012,第16期

机译：基因组暗物质：由基因组可映射性得分说明的短读映射的可靠性
3. Mapping short DNA sequencing reads and calling variants using mapping quality scores. [J] . Li H, Ruan J, Durbin R Genome research . 2008,第11期

机译：绘制短DNA测序图，并使用绘制质量评分来调用变体。
4. Exploration of Short Reads Genome Mapping in Hardware [C] . Fernandez Edward, Najjar Walid, Harris Elena, 2010 International Conference on Field Programmable Logic and Applications . 2010

机译：硬件中短读基因组定位的探索
5. GPU Acceleration of Genome Read Mapping [D] . Chauhan, Pragya. 2020

机译：GPU基因组读取映射的加速度
6. MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence [O] . Turki Turki, Usman Roshan 2014

机译：MaxSSmap：一个GPU程序用于将不同的短读映射到具有最大得分子序列的基因组
7. MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence [O] . Turki Turki, Usman Roshan 2014

机译：MaxSSmap：一个GPU程序，用于将不同的短读映射到具有最大得分子序列的基因组

MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence

摘要

著录项

相似文献

相关主题

期刊订阅