We introduce two succinct data structures to solve various string problems. One is for storing the information of lcp, the longest common prefix, between suffixes in the suffix array, and the other is an improvement in the compressed suffix array which supports linear time counting queries for any pattern. The former occupies only 2n + o(n) bits for a text of length n for computing lcp between adjacent suffixes in lexicographic order in constant time, and 6n + o(n) bits between any two suffixes. No data structure in the literature attained linear size. The latter has size proportional to the text size and it is applicable to texts on any alphabet Σ such that |Σ| = logO(1)n. These space-economical data structures are useful in processing huge amounts of text data.
展开▼
机译:我们引入了两个简洁的数据结构来解决各种字符串问题。一种是在后缀数组的后缀之间存储最长的公共前缀 lcp I>的信息,另一种是对压缩后缀数组的改进,它支持对任何模式进行线性计时查询。前者仅占2 n I> + o I>( n I>)位,而长度为 n I>的文本用于计算 lcp I>按字典顺序在相邻后缀之间保持恒定的时间,并且任意两个后缀之间有6 n I> + o I>( n I>)位。文献中没有数据结构达到线性大小。后者的大小与文本大小成正比,并且适用于任何字母Σ上的文本,使得|Σ| = log O I>(1) SUP> n I>。这些节省空间的数据结构可用于处理大量文本数据。
展开▼