Breaking a time-and-space barrier in constructing full-text indices

机译：突破全文索引构建的时空障碍

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Suffix trees and suffix arrays are the most prominent full-text indices, and their construction algorithms are well studied. It has been open for a long time whether these indices can be constructed in both O(n log n) time and O(n log n)-bit working space, where n denotes the length of the text. In the literature, the fastest algorithm runs in O(n) time, while it requires O(n log n)-bit working space. On the other hand, the most space-efficient algorithm requires O(n)-bit working space while it runs in O(n log n) time. This paper breaks the long-standing time-and-space barrier under the unit-cost word RAM. We give an algorithm for constructing the suffix array which takes O(n) time and O(n)-bit working space, for texts with constant-size alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires O(n log/sup /spl epsi/) time and O(n)-bit working space for any 0 > /spl epsi/ > 1. Apart from that, our algorithm can also be adopted to build other existing full-text indices, such as Compressed Suffix Tree, Compressed Suffix Arrays and FM-index. We also study the general case where the size of the alphabet A is not constant. Our algorithm can construct a suffix array and a suffix tree using optimal O(n log |A|)-bit working space while running in O(n log log |A|) time and O(n log/sup /spl epsi/) time, respectively. These are the first algorithms that achieve 0(n log n) time with optimal working space, under a reasonable assumption that log |A| = o(log n).

机译：后缀树和后缀数组是最主要的全文索引，并且对其构造算法也进行了深入研究。是否可以在O（n log n）时间和O（n log n）位工作空间中构造这些索引已经很长时间了，其中n表示文本的长度。在文献中，最快的算法运行时间为O（n），而它需要O（n log n）位的工作空间。另一方面，最节省空间的算法在运行O（n log n）时需要O（n）位工作空间。本文打破了单价字RAM下长期存在的时空障碍。对于具有恒定大小的字母的文本，我们给出了一种算法，用于构造后缀数组，该后缀数组需要O（n）时间和O（n）位工作空间。请注意，时间和空间边界都是最佳的。为了构造后缀树，我们的算法需要O（n log / sup / spl epsi // n）时间和O（n）位工作空间，以用于任何0> / spl epsi />1。除此之外，我们的算法还可以还可以采用它来构建其他现有的全文本索引，例如压缩后缀树，压缩后缀数组和FM-index。我们还研究了字母A的大小不恒定的一般情况。我们的算法可以在O（n log log | A |）时间和O（n log / sup / spl epsi // n）时间。在合理的假设log | A |的前提下，这是第一种以最佳工作空间实现0（n log n）时间的算法。 = o（log n）。

著录项

来源
《Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on》|2003年|p.251-260|共10页
会议地点
作者
Wing-Kai Hon; Sadakane K.; Wing-Kin Sung;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
tree data structures; computational complexity; trees (mathematics); indexing; text analysis; time-and-space barrier; full-text index; construction algorithm; space-efficient algorithm; unit-cost word RAM; compressed suffix tree; compressed suffix array;

机译：树数据结构;计算复杂度;树（数学）;索引;文本分析;时空障碍;全文索引;构造算法;节省空间的算法;单位成本字RAM;压缩后缀树;压缩后缀数组;

相似文献

外文文献
中文文献
专利

1. BREAKING A TIME-AND-SPACE BARRIER IN CONSTRUCTING FULL-TEXT INDICES [J] . WING-KAI HON, KUNIHIKO SADAKANE, WING-KIN SUNG SIAM Journal on Computing . 2009,第6期

机译：打破全文索引的时间和空间障碍
2. The Mechanism Analysis of Natural Language Texts in Order to Construct A Model of the Full-text Document [J] . A.S. Lebedev Science and Technology . 2013,第2A期

机译：自然语言文本的机理分析以构建全文本模型
3. Constructing a Thesaurus for Information Retrieval in Full-Text Databases [J] . S. V. Zhmailo Automatic Documentation and Mathematical Linguistics . 2006,第5期

机译：在全文数据库中构建信息检索同义词库
4. Breaking a Time-and-Space Barrier in Constructing Full-Text Indices [C] . Wing-Kai Hon, Kunthiko Sadakane, Wing-Kin Sung Annual IEEE Symposium on Foundations of Computer Science . 2003

机译：在构建全文索引时打破时间和空间障碍
5. Breaking the Mucosal Barrier: Investigating the Role of MicroRNA in the Disruption of the Intestinal Epithelial Barrier During SIV Infection [D] . Gaulke, Christopher Andrew. 2014

机译：打破粘膜屏障：SIV感染过程中调查MicroRNA在肠上皮屏障破坏中的作用
6. Breaking Barriers. New Insights into Airway Epithelial Barrier Function in Health and Disease [O] . Fariba Rezaee, Steve N. Georas -1

机译：突破壁垒。对健康和疾病中气道上皮屏障功能的新见解
7. Breaking a Time-and-Space Barrier in Constructing Full-Text Indices [O] . Wing-kai Hon Kunihiko Sadakane Wing-kin Sung 2008

机译：构造全文索引时打破时空障碍

Breaking a time-and-space barrier in constructing full-text indices

摘要

著录项

相似文献

相关主题

期刊订阅