A space-efficient construction of the Burrows-Wheeler transform for genomic data

Lippert RA; Mobarry CM; Walenz BP

首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >A space-efficient construction of the Burrows-Wheeler transform for genomic data

【24h】

A space-efficient construction of the Burrows-Wheeler transform for genomic data

机译：基因组数据的Burrows-Wheeler转换的节省空间的构造

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Algorithms for exact string matching have substantial application in computational biology. Time-efficient data structures which support a variety of exact string matching queries, such as the suffix tree and the suffix array, have been applied to such problems. As sequence databases grow, more space-efficient approaches to exact matching are becoming more important. One such data structure, the compressed suffix array (CSA), based on the Burrows - Wheeler transform, has been shown to require memory which is nearly equal to the memory requirements of the original database, while supporting common sorts of query problems time efficiently. However, building a CSA from a sequence in efficient space and time is challenging. In 2002, the first space-efficient CSA construction algorithm was presented. That implementation used ( 1+ 2 log(2) |Sigma|)( 1+ epsilon) bits per character ( where epsilon is a small fraction). The construction algorithm ran in as much as twice that space, in O(|Sigma| n log(n)) time. We have created an implementation which can also achieve these asymptotic bounds, but for small alphabets, and only uses 1/2 ( 1+|Sigma|)( 1+ epsilon) bits per character, a factor of 2 less space for nucleotide alphabets. We present time and space results for the CSA construction and querying of our implementation on publicly available genome data which demonstrate the practicality of this approach.

机译：精确字符串匹配的算法在计算生物学中具有重要的应用。支持后继树和后缀数组等各种精确的字符串匹配查询的省时数据结构已应用于此类问题。随着序列数据库的增长，更节省空间的精确匹配方法变得越来越重要。一种这样的数据结构，即基于Burrows-Wheeler变换的压缩后缀数组（CSA），显示需要的存储空间几乎等于原始数据库的存储空间，同时又能有效地支持常见的查询问题。但是，从有效的空间和时间序列中构建CSA颇具挑战性。在2002年，提出了第一个节省空间的CSA构造算法。该实现每个字符使用（1+ 2 log（2）| Sigma |）（1+ epsilon）位（其中epsilon是一小部分）。构造算法在O（| Sigma | n log（n））时间内运行的空间是该空间的两倍。我们创建了一个也可以实现这些渐近边界的实现，但是对于小字母，每个字符仅使用1/2（1+ | Sigma |）（1+ epsilon）位，因此核苷酸字母的空间减少了2倍。我们提供了CSA构建的时间和空间结果，并在公开的基因组数据上查询了我们的实现方式，这证明了该方法的实用性。

著录项

来源
《Journal of computational biology: A journal of computational molecular cell biology》 |2005年第7期|共9页
作者
Lippert RA; Mobarry CM; Walenz BP;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类普通生物学;
关键词
Burrows-Wheeler; suffix array; string matching; comparative genomics; SUFFIX ARRAY CONSTRUCTION;

机译：Burrows-Wheeler;后缀数组;字符串匹配;比较基因组学;SUFFIX ARRAY CONSTRUCTION;
入库时间 2022-08-19 00:23:13

相似文献

外文文献
中文文献
专利

1. A space-efficient construction of the Burrows-Wheeler transform for genomic data [J] . Lippert RA, Mobarry CM, Walenz BP Journal of computational biology: A journal of computational molecular cell biology . 2005,第7期

机译：基因组数据的Burrows-Wheeler转换的节省空间的构造
2. Analysis of genomic rearrangements by using the Burrows-Wheeler transform of short-read data [J] . Kouichi Kimura, Asako Koike BMC Bioinformatics . 2015,第SUPPLEMENTa18期

机译：利用短读数据的Burrows-Wheeler变换分析基因组重排
3. Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform [J] . Cox Anthony J., Bauer Markus J., Jakobi Tobias, Bioinformatics . 2012,第11期

机译：利用Burrows-Wheeler变换大规模压缩基因组序列数据库
4. Space-Efficient Construction of the Burrows-Wheeler Transform [C] . Timo Beller, Maike Zwerger, Simon Gog, International symposium on string processing and information retrieval . 2013

机译：Burrows-Wheeler变换的节省空间构造
5. Parallel lossless data compression based on the Burrows-Wheeler transform. [D] . Gilchrist, Jeffrey S. 2007

机译：基于Burrows-Wheeler变换的并行无损数据压缩。
6. Analysis of genomic rearrangements by using the Burrows-Wheeler transform of short-read data [O] . Kouichi Kimura, Asako Koike 2015

机译：利用短读数据的Burrows-Wheeler变换分析基因组重排
7. Analysis of genomic rearrangements by using the Burrows-Wheeler transform of short-read data [O] . Kouichi Kimura, Asako Koike 2015

机译：利用短读数据的Burrows-Wheeler变换分析基因组重排
8. Construction of an integrated database to support genomic sequence analysis [R] . Gillevet, P., Overbeek, R. 1999

机译：构建综合数据库以支持基因组序列分析

A space-efficient construction of the Burrows-Wheeler transform for genomic data

摘要

著录项

相似文献

相关主题

期刊订阅