首页> 外文会议>International symposium on string processing and information retrieval;Workshop on compression, text, and algorithms >Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array
【24h】

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

机译:简洁排列的最长公共前缀数组的低空间外部存储器构造

获取原文

摘要

The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length n can be represented as an array of length n words, or, in the presence of the SA, as a bit vector of 2n bits plus asymptotically negligible support data structures. External memory construction algorithms for the LCP array have been proposed, but those proposed so far have a space requirement of O(n) words (i.e. O(n log n) bits) in external memory. This space requirement is in some practical cases prohibitively expensive. We present an external memory algorithm for constructing the 2n bit version of the LCP array which uses O(n log σ) bits of additional space in external memory when given a (compressed) BWT with alphabet size σ and a sampled inverse suffix array at sampling rate O(log n). This is often a significant space gain in practice where σ is usually much smaller than n or even constant. The algorithm has average run-time O(n log n log σ) and worst case run-time O(n~2 log σ). It can be improved to O(n log~2 n log σ) worst case time while keeping the same space bound in external memory if O(n/log n) bits of internal memory are available. We also present experimental data showing that our approach is practical.
机译:最长公共前缀(LCP)数组是索引字符串匹配中的通用辅助数据结构。它可以用于使用后缀数组(SA)加快搜索速度,并提供底层后缀树的拓扑结构的隐式表示。长度为n的字符串的LCP数组可以表示为长度为n的单词的数组,或者在存在SA的情况下,表示为2n位的位向量加上渐近可忽略的支持数据结构。已经提出了用于LCP阵列的外部存储器构造算法,但是到目前为止提出的那些算法在外部存储器中具有O(n)个字(即,O(nlog n)位)的空间要求。在某些实际情况下,这种空间需求非常昂贵。我们提出了一种外部存储算法,用于构造LCP阵列的2n位版本,当给定一个(压缩的)字母大小为σ的BWT并在采样时采样一个反向后缀数组时,该算法将使用外部存储器中O(n logσ)位的额外空间比率O(log n)。在实践中,这通常是一个显着的空间增益,其中σ通常比n小得多,甚至恒定。该算法具有平均运行时间O(n log n logσ)和最坏情况下的运行时间O(n〜2 logσ)。如果内部存储器的O(n / log n)位可用,则可以在最坏情况下将时间改进为O(n log〜2 n logσ),同时在外部存储器中保留相同的空间。我们还提供了实验数据,表明我们的方法是可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号