Engineering a Lightweight Suffix Array Construction Algorithm

Giovanni Manzini; Paolo Ferragina

首页> 外文期刊>Algorithmica >Engineering a Lightweight Suffix Array Construction Algorithm

【24h】

Engineering a Lightweight Suffix Array Construction Algorithm

机译：设计轻量级后缀数组构造算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper we describe a new algorithm for building the suffix array of a string. This task is equivalent to the problem of lexicographically sorting all the suffixes of the input string. Our algorithm is based on a new approach called deep–shallow sorting: we use a shallow sorter for the suffixes with a short common prefix, and a deep sorter for the suffixes with a long common prefix. All the known algorithms for building the suffix array either require a large amount of space or are inefficient when the input string contains many repeated substrings. Our algorithm has been designed to overcome this dichotomy. Our algorithm is lightweight in the sense that it uses very small space in addition to the space required by the suffix array itself. At the same time our algorithm is fast even when the input contains many repetitions: this has been shown by extensive experiments with inputs of size up to 110 Mb. The source code of our algorithm, as well as a C library providing a simple API, is available under the GNU GPL.

机译：在本文中，我们描述了一种用于构建字符串后缀数组的新算法。此任务等效于按字典顺序对输入字符串的所有后缀进行排序的问题。我们的算法基于一种称为深浅排序的新方法：对于具有短公共前缀的后缀，我们使用浅分类器；对于具有长公共前缀的后缀，我们使用深分类器。当输入字符串包含许多重复的子字符串时，用于构建后缀数组的所有已知算法要么占用大量空间，要么效率低下。我们的算法旨在克服这种二分法。我们的算法是轻量级的，因为它除了使用后缀数组本身所需的空间外，还占用很小的空间。同时，即使输入包含很多重复，我们的算法也很快：大量实验显示，输入大小最大为110 Mb，表明了这一点。 GNU GPL下提供了我们算法的源代码以及提供简单API的C库。

著录项

来源
《Algorithmica》 |2004年第1期|p. 33-50|共18页
作者
Giovanni Manzini; Paolo Ferragina;
展开▼
作者单位

Dipartimento di Informatica, Universita del Piemonte Orientale, Alessandria, Italy, and IIT-CNR, Pisa, Italy;

Dipartimento di Informatica, Universita di Pisa, Pisa, Italy;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Suffix array; Algorithmic engineering; Space-economical algorithms; Full-text index; Suffix tree;

机译：后缀数组;算法工程;空间经济算法;全文索引;后缀树;

相似文献

外文文献
中文文献
专利

1. A bioinformatician’s guide to the forefront of suffix array construction algorithms [J] . Anish Man Singh Shrestha, Martin C. Frith, Paul Horton Briefings in bioinformatics . 2014,第2期

机译：生物信息学家指南，介绍后缀数组构建算法的最前沿
2. Two Efficient Algorithms for Linear Time Suffix Array Construction [J] . Nong Ge, Zhang Sen, Chan Wai Hong Computers, IEEE Transactions on . 2011,第10期

机译：线性时间后缀数组构造的两种有效算法
3. A Taxonomy of Suffix Array Construction Algorithms [J] . SIMON J. PUGLISI, W. F. SMYTH, ANDREW H. TURPIN ACM Computing Surveys . 2007,第2期

机译：后缀数组构造算法的分类
4. Engineering a Lightweight Suffix Array Construction Algorithm [C] . Giovanni Manzini, Paolo Ferragina 10th Annual European Symposium on Algorithms - ESA 2002, Sep 17-21, 2002, Rome, Italy . 2002

机译：设计轻量级后缀数组构造算法
5. Parallel external memory suffix array construction. [D] . Walia, Nancy. 2009

机译：并行外部存储器后缀数组构造。
6. A bioinformatician’s guide to the forefront of suffix array construction algorithms [O] . Anish Man Singh Shrestha, *, Martin C. Frith, -1

机译：生物信息学家指南介绍后缀数组构建算法的最前沿
7. Engineering a lightweight suffix array construction algorithm [O] . G. MANZINI, FERRAGINA P 2004

机译：设计轻量级后缀数组构造算法

Engineering a Lightweight Suffix Array Construction Algorithm

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅