首页> 外文会议>Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on >Breaking a time-and-space barrier in constructing full-text indices
【24h】

Breaking a time-and-space barrier in constructing full-text indices

机译:突破全文索引构建的时空障碍

获取原文

摘要

Suffix trees and suffix arrays are the most prominent full-text indices, and their construction algorithms are well studied. It has been open for a long time whether these indices can be constructed in both O(n log n) time and O(n log n)-bit working space, where n denotes the length of the text. In the literature, the fastest algorithm runs in O(n) time, while it requires O(n log n)-bit working space. On the other hand, the most space-efficient algorithm requires O(n)-bit working space while it runs in O(n log n) time. This paper breaks the long-standing time-and-space barrier under the unit-cost word RAM. We give an algorithm for constructing the suffix array which takes O(n) time and O(n)-bit working space, for texts with constant-size alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires O(n log/sup /spl epsi/) time and O(n)-bit working space for any 0 > /spl epsi/ > 1. Apart from that, our algorithm can also be adopted to build other existing full-text indices, such as Compressed Suffix Tree, Compressed Suffix Arrays and FM-index. We also study the general case where the size of the alphabet A is not constant. Our algorithm can construct a suffix array and a suffix tree using optimal O(n log |A|)-bit working space while running in O(n log log |A|) time and O(n log/sup /spl epsi/) time, respectively. These are the first algorithms that achieve 0(n log n) time with optimal working space, under a reasonable assumption that log |A| = o(log n).
机译:后缀树和后缀数组是最主要的全文索引,并且对其构造算法也进行了深入研究。是否可以在O(n log n)时间和O(n log n)位工作空间中构造这些索引已经很长时间了,其中n表示文本的长度。在文献中,最快的算法运行时间为O(n),而它需要O(n log n)位的工作空间。另一方面,最节省空间的算法在运行O(n log n)时需要O(n)位工作空间。本文打破了单价字RAM下长期存在的时空障碍。对于具有恒定大小的字母的文本,我们给出了一种算法,用于构造后缀数组,该后缀数组需要O(n)时间和O(n)位工作空间。请注意,时间和空间边界都是最佳的。为了构造后缀树,我们的算法需要O(n log / sup / spl epsi // n)时间和O(n)位工作空间,以用于任何0> / spl epsi />1。除此之外,我们的算法还可以还可以采用它来构建其他现有的全文本索引,例如压缩后缀树,压缩后缀数组和FM-index。我们还研究了字母A的大小不恒定的一般情况。我们的算法可以在O(n log log | A |)时间和O(n log / sup / spl epsi // n)时间。在合理的假设log | A |的前提下,这是第一种以最佳工作空间实现0(n log n)时间的算法。 = o(log n)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号