Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

机译：简洁排列的最长公共前缀数组的低空间外部存储器构造

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length n can be represented as an array of length n words, or, in the presence of the SA, as a bit vector of 2n bits plus asymptotically negligible support data structures. External memory construction algorithms for the LCP array have been proposed, but those proposed so far have a space requirement of O(n) words (i.e. O(n log n) bits) in external memory. This space requirement is in some practical cases prohibitively expensive. We present an external memory algorithm for constructing the 2n bit version of the LCP array which uses O(n log σ) bits of additional space in external memory when given a (compressed) BWT with alphabet size σ and a sampled inverse suffix array at sampling rate O(log n). This is often a significant space gain in practice where σ is usually much smaller than n or even constant. The algorithm has average run-time O(n log n log σ) and worst case run-time O(n~2 log σ). It can be improved to O(n log~2 n log σ) worst case time while keeping the same space bound in external memory if O(n/log n) bits of internal memory are available. We also present experimental data showing that our approach is practical.

机译：最长公共前缀（LCP）数组是索引字符串匹配中的通用辅助数据结构。它可以用于使用后缀数组（SA）加快搜索速度，并提供底层后缀树的拓扑结构的隐式表示。长度为n的字符串的LCP数组可以表示为长度为n的单词的数组，或者在存在SA的情况下，表示为2n位的位向量加上渐近可忽略的支持数据结构。已经提出了用于LCP阵列的外部存储器构造算法，但是到目前为止提出的那些算法在外部存储器中具有O（n）个字（即，O（nlog n）位）的空间要求。在某些实际情况下，这种空间需求非常昂贵。我们提出了一种外部存储算法，用于构造LCP阵列的2n位版本，当给定一个（压缩的）字母大小为σ的BWT并在采样时采样一个反向后缀数组时，该算法将使用外部存储器中O（n logσ）位的额外空间比率O（log n）。在实践中，这通常是一个显着的空间增益，其中σ通常比n小得多，甚至恒定。该算法具有平均运行时间O（n log n logσ）和最坏情况下的运行时间O（n〜2 logσ）。如果内部存储器的O（n / log n）位可用，则可以在最坏情况下将时间改进为O（n log〜2 n logσ），同时在外部存储器中保留相同的空间。我们还提供了实验数据，表明我们的方法是可行的。

著录项

来源
《International symposium on string processing and information retrieval;Workshop on compression, text, and algorithms》|2016年|178-190|共13页
会议地点
作者
German Tischler;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Generalized enhanced suffix array construction in external memory [J] . Felipe A. Louza, Guilherme P. Telles, Steve Hoffmann, Algorithms for Molecular Biology . 2017,第1期

机译：外部存储器中的通用增强后缀数组构造
2. Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet [J] . Juha K{}rkk{}inen, Dominik Kempa LIPIcs : Leibniz International Proceedings in Informatics . 2017,第1期

机译：工程外部存储器LCP阵列构造：并行，就地和大字母
3. LCP Array Construction in External Memory [J] . JUHA KAERKKAEINEN, DOMINIK KEMPA Journal of experimental algorithmics . 2016,第1期

机译：外部存储器中的LCP阵列构造
4. Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array [C] . German Tischler International Symposium on String Processing and Information Retrieval . 2016

机译：低空间外部存储器构建简洁允许的最长通用前缀数组
5. Parallel external memory suffix array construction. [D] . Walia, Nancy. 2009

机译：并行外部存储器后缀数组构造。
6. Generalized enhanced suffix array construction in external memory [O] . Felipe A. Louza, Guilherme P. Telles, Steve Hoffmann, 2017

机译：外部存储器中的通用增强后缀数组构造
7. Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array [O] . German Tischler 2016

机译：低空间外部存储器构建简洁允许的最长通用前缀数组
8. Optimum Synchronization Codes to Follow an Alternating Mark/Space Prefix [R] . Coleman, J. O. 1981

机译：遵循交替标记/空格前缀的最佳同步代码

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅