PUNAS: A Parallel Ungapped-Alignment-Featured Seed Verification Algorithm for Next-Generation Sequencing Read Alignment

机译：PUNAS：并行的无间隙比对功能的种子验证算法，用于下一代测序读取比对

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The progress of next-generation sequencing has a major impact on medical and genomic research. This technology can now produce billions of short DNA fragments (reads) in a single run. One of the most demanding computational problems used by almost every sequencing pipeline is short-read alignment; i.e. determining where each fragment originated from in the original genome. Most current solutions are based on a seed-and-extend approach, where promising candidate regions (seeds) are first identified and subsequently extended in order to verify whether a full high-scoring alignment actually exists in the vicinity of each seed. Seed verification is the main bottleneck in many state-of-the-art aligners and thus finding fast solutions is of high importance. We present a parallel un gapped-alignment-featured seed verification (PUNAS) algorithm, a fast filter for effectively removing the majority of false positive seeds, thus significantly accelerating the short-read alignment process. PUNAS is based on bit-parallelism and takes advantage of SIMD vector units of modern microprocessors. Our implementation employs a vectorize-and-scale approach supporting multi-core CPUs and many-core Knights Landing (KNL)-based Xeon Phi processors. Performance evaluation reveals that PUNAS is over three orders-of-magnitude faster than seed verification with the Smith-Waterman algorithm and around one order-of-magnitude faster than seed verification with the banded version of Myers bit-vector algorithm. Using a single thread it achieves a speedup of up to 7.3, 27.1, and 11.6 compared to the shifted Hamming distance filter on a SSE, AVX2, and AVX-512 based CPU/KNL, respectively. The speed of our framework further scales almost linearly with the number of cores. PUNAS is open-source software available at https://github.com/Xu-Kai/PUNASfilter.

机译：下一代测序的进展对医学和基因组研究具有重大影响。现在，这项技术可以一次运行产生数十亿个短DNA片段（读段）。几乎每个测序流水线使用的最苛刻的计算问题之一是短读比对。即确定每个片段起源于原始基因组的位置。当前大多数解决方案都基于种子和扩展方法，首先确定有希望的候选区域（种子），然后对其进行扩展，以验证每个种子附近是否确实存在完整的高分比对。种子验证是许多最先进的对准器的主要瓶颈，因此，找到快速解决方案非常重要。我们提出了一种并行的无间隙比对功能的种子验证（PUNAS）算法，这是一种用于有效去除大多数假阳性种子的快速过滤器，从而显着加速了短读比对过程。 PUNAS基于位并行性，并利用了现代微处理器的SIMD向量单元。我们的实现采用矢量化和缩放方法，支持多核CPU和基于多核Knights Landing（KNL）的至强融核处理器。性能评估显示，PUNAS比使用Smith-Waterman算法的种子验证快三个数量级，比使用Myers位向量算法的带区版本的种子验证快大约一个数量级。与基于SSE，AVX2和AVX-512的CPU / KNL上的移位汉明距离滤波器相比，使用单线程可实现高达7.3、27.1和11.6的加速。我们框架的速度几乎与内核数成线性比例关系。 PUNAS是可从https://github.com/Xu-Kai/PUNASfilter获取的开源软件。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2017年|52-61|共10页
会议地点
作者
Yuandong Chan; Kai Xu; Haidong Lan; Weiguo Liu; Yongchao Liu; Bertil Schmidt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bioinformatics; Genomics; Tools; Sequential analysis; Field programmable gate arrays; Acceleration; Indexes;

机译：生物信息学;基因组学;工具;顺序分析;现场可编程门阵列;加速度;索引;

相似文献

外文文献
中文文献
专利

1. Comparative analysis of algorithms for next-generation sequencing read alignment [J] . Ruffalo Matthew, LaFramboise Thomas, Koyutuerk Mehmet . Bioinformatics . 2011,第20期

机译：下一代测序阅读比对算法的比较分析
2. Comparative analysis of algorithms for next-generation sequencing read alignment [J] . Mehmet Koyutürk Bioinformatics . 2011,第20期

机译：下一代测序阅读比对算法的比较分析
3. Extending Read Lengths on the Ion S5 Next-Generation Sequencing System to 600 Base Reads Substantially Improves HLA Typing by Next-Generation Sequencing [J] . Landes M. A., Burgess T., Duncan C., The Journal of molecular diagnostics: JMD . 2016,第6期

机译：将Ion S5下一代测序系统上的读取长度扩展至600个碱基读取，可通过下一代测序显着改善HLA分型
4. PUNAS: A Parallel Ungapped-Alignment-Featured Seed Verification Algorithm for Next-Generation Sequencing Read Alignment [C] . Yuandong Chan, Kai Xu, Haidong Lan, IEEE International Parallel and Distributed Processing Symposium . 2017

机译：PUNAS：一个平行的未出现对准的特种种子验证算法，用于下一代测序读取对齐
5. Development of SRADE tool and analysis of quality scores of the reads of Next-Generation Sequencing data. [D] . Kotha, Chaitanya Krishna. 2014

机译：开发SRADE工具并分析下一代测序数据读数的质量得分。
6. HIVE-Hexagon: High-Performance Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis [O] . Luis Santana-Quintero, Hayley Dingerdissen, Jean Thierry-Mieg, -1

机译：HIVE-Hexagon：用于下一代测序数据分析的高性能并行序列比对
7. A Review on Sequence Alignment Algorithms for Short Reads Based on Next-Generation Sequencing [O] . Jeongkyu Kim, Mingeun Ji, Gangman Yi 2020

机译：基于下一代测序的短读取序列对准算法综述

PUNAS: A Parallel Ungapped-Alignment-Featured Seed Verification Algorithm for Next-Generation Sequencing Read Alignment

摘要

著录项

相似文献

相关主题

期刊订阅