首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Parallel Computation of the Burrows-Wheeler Transform of Short Reads Using Prefix Parallelism
【24h】

Parallel Computation of the Burrows-Wheeler Transform of Short Reads Using Prefix Parallelism

机译:使用前缀并行度对短读的Burrows-Wheeler变换进行并行计算

获取原文
获取原文并翻译 | 示例

摘要

The Burrows-Wheeler transform (BWT) of short-read data has unexplored potential utilities, such as for efficient and sensitive variation analysis against multiple reference genome sequences, because it does not depend on any particular reference genome sequence, unlike conventional mapping-based methods. However, since the amount of read data is generally much larger than the size of the reference sequence, computation of the BWT of reads is not easy, and this hampers development of potential applications. For the alleviation of this problem, a new method of computing the BWT of reads in parallel is proposed. The BWT, corresponding to a sorted list of suffixes of reads, is constructed incrementally by successively including longer and longer suffixes. The working data is divided into more than 10,000 "blocks" corresponding to sublists of suffixes with the same prefixes. Thousands of groups of blocks can be processed in parallel while making exclusive writes and concurrent reads into a shared memory. Reads and writes are basically sequential, and the read concurrency is limited to two. Thus, a fine-grained parallelism, referred to as prefix parallelism, is expected to work efficiently. The time complexity for processing n reads of length l is O(nl(2)). On actual biological DNA sequence data of about 100 Gbp with a read length of 100 bp (base pairs), a tentative implementation of the proposed method took less than an hour on a single-node computer; i.e., it was about three times faster than one of the fastest programs developed so far.
机译:短读取数据的Burrows-Wheeler变换(BWT)具有未开发的潜在效用,例如用于针对多个参考基因组序列进行高效而敏感的变异分析,因为它不依赖于任何特定的参考基因组序列,这与传统的基于映射的方法不同。但是,由于读取的数据量通常远大于参考序列的大小,因此读取的BWT的计算并不容易,并且这阻碍了潜在应用程序的开发。为了缓解这个问题,提出了一种计算并行读取的BWT的新方法。通过依次包含越来越长的后缀来递增地构造与读取后缀的排序列表相对应的BWT。工作数据被分为超过10,000个“块”,对应于具有相同前缀的后缀子列表。数以千计的块组可以并行处理,同时对共享内存进行独占写入和并发读取。读取和写入基本上是顺序的,并且读取并发限制为两个。因此,期望细粒度的并行性(称为前缀并行性)可以有效地工作。处理n个长度为1的读段的时间复杂度为O(nl(2))。在大约100 Gbp的实际生物DNA序列数据和100 bp的读取长度(碱基对)的基础上,该方法的暂定实现在单节点计算机上花费了不到一个小时的时间;也就是说,它比到目前为止开发的最快的程序之一快三倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号