Next-generation massively parallel short-read mapping on FPGAs

机译：FPGA上的下一代大规模并行短读映射

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The mapping of DNA sequences to huge genome databases is an essential analysis task in modern molecular biology. Having linearized reference genomes available, the alignment of short DNA reads obtained from the sequencing of an individual genome against such a database provides a powerful diagnostic and analysis tool. In essence, this task amounts to a simple string search tolerating a certain number of mismatches to account for the diversity of individuals. The complexity of this process arises from the sheer size of the reference genome. It is further amplified by current next-generation sequencing technologies, which produce a huge number of increasingly short reads. These short reads hurt established alignment heuristics like BLAST severely. This paper proposes an FPGA-based custom computation, which performs the alignment of short DNA reads in a timely manner by the use of tremendous concurrency for reasonable costs. The special measures to achieve an extremely efficient and compact mapping of the computation to a Xilinx FPGA architecture are described. The presented approach also surpasses all software heuristics in the quality of its results. It guarantees to find all alignment locations of a read in the database while also allowing a freely adjustable character mismatch threshold. On the contrary, advanced fast alignment heuristics like Bowtie and Maq can only tolerate small mismatch maximums with a quick deterioration of the probability to detect existing valid alignments. The performance comparison with these widely used software tools also demonstrates that the proposed FPGA computation achieves its guaranteed exact results in very competitive time.

机译：将DNA序列映射到庞大的基因组数据库是现代分子生物学中必不可少的分析任务。有了线性参考基因组，从单个基因组测序中获得的短DNA读数与此类数据库的比对可提供强大的诊断和分析工具。从本质上讲，此任务相当于一个简单的字符串搜索，可以容忍一定数量的不匹配，以说明个体的多样性。该过程的复杂性来自参考基因组的绝对大小。当前的下一代测序技术进一步放大了该技术，该技术产生了大量越来越短的读数。这些短读严重损害了已建立的比对启发法，如BLAST。本文提出了一种基于FPGA的自定义计算，该计算通过使用大量并发以合理的成本及时执行短DNA读取的比对。描述了实现计算到Xilinx FPGA架构的极其高效和紧凑映射的特殊措施。所提出的方法在结果质量上也超过了所有软件启发式方法。它保证在数据库中找到读取的所有对齐位置，同时还允许自由调整字符不匹配阈值。相反，像Bowtie和Maq这样的高级快速比对启发法只能容忍较小的不匹配最大值，同时会迅速降低检测现有有效比对的概率。与这些广泛使用的软件工具进行的性能比较还表明，所建议的FPGA计算可在极具竞争力的时间内实现其保证的准确结果。

著录项

来源
《22nd IEEE International conference on Application-specific Systems, Architectures and Processors》|2011年|p.195-201|共7页
会议地点
作者
Knodel Oliver; Preusser Thomas B.; Spallek Rainer G.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类总体结构、系统结构;
关键词
FPGA; Sequence Alignment; Short-Read Mapping;

机译：FPGA;序列比对;短读映射;

相似文献

外文文献
中文文献
专利

1. Next-generation sequencing of newborn screening genes: the accuracy of short-read mapping [J] . C. Trier, G. Fournous, J. M. Strand, NPJ genomic medicine. . 2020,第1期

机译：新生儿筛查基因的下一代测序：短读映射的准确性
2. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding [J] . Kevin Judd McKernan, Heather E. Peckham, Gina L. Costa, Genome Research . 2009,第9期

机译：使用两碱基编码的短读大规模平行连接测序发现的人类基因组序列和结构变异
3. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. [J] . McKernan KJ, Peckham HE, Costa GL Genome research . 2009,第9期

机译：使用两碱基编码的短读大规模平行连接测序揭示的人类基因组序列和结构变异。
4. Next-generation massively parallel short-read mapping on FPGAs [C] . Knodel Oliver, Preusser Thomas B., Spallek Rainer G. IEEE International Conference on Application-specific Systems Architectures and Processors . 2011

机译：FPGA上的下一代大型并行短读映射
5. Empowering FPGAs for Massively Parallel Applications [D] . Shiddibhavi, Suhas Ashok 2018

机译：为大规模并行应用赋能FPGA
6. Sequence and structural variation in a human genome uncovered by short-read massively parallel ligation sequencing using two-base encoding [O] . Kevin Judd McKernan, Heather E. Peckham, Gina L. Costa, 2009

机译：使用两碱基编码的短读大规模平行连接测序发现的人类基因组序列和结构变异
7. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. [O] . Wan-Ping Lee, Michael P Stromberg, Alistair Ward, 2014

机译：mOsaIK：基于散列的算法，用于精确的下一代测序短读取映射。

Next-generation massively parallel short-read mapping on FPGAs

摘要

著录项

相似文献

相关主题

期刊订阅