首页> 外文期刊>Algorithmica >Engineering a Lightweight Suffix Array Construction Algorithm
【24h】

Engineering a Lightweight Suffix Array Construction Algorithm

机译:设计轻量级后缀数组构造算法

获取原文
获取原文并翻译 | 示例

摘要

In this paper we describe a new algorithm for building the suffix array of a string. This task is equivalent to the problem of lexicographically sorting all the suffixes of the input string. Our algorithm is based on a new approach called deep–shallow sorting: we use a shallow sorter for the suffixes with a short common prefix, and a deep sorter for the suffixes with a long common prefix. All the known algorithms for building the suffix array either require a large amount of space or are inefficient when the input string contains many repeated substrings. Our algorithm has been designed to overcome this dichotomy. Our algorithm is lightweight in the sense that it uses very small space in addition to the space required by the suffix array itself. At the same time our algorithm is fast even when the input contains many repetitions: this has been shown by extensive experiments with inputs of size up to 110 Mb. The source code of our algorithm, as well as a C library providing a simple API, is available under the GNU GPL.
机译:在本文中,我们描述了一种用于构建字符串后缀数组的新算法。此任务等效于按字典顺序对输入字符串的所有后缀进行排序的问题。我们的算法基于一种称为深浅排序的新方法:对于具有短公共前缀的后缀,我们使用浅分类器;对于具有长公共前缀的后缀,我们使用深分类器。当输入字符串包含许多重复的子字符串时,用于构建后缀数组的所有已知算法要么占用大量空间,要么效率低下。我们的算法旨在克服这种二分法。我们的算法是轻量级的,因为它除了使用后缀数组本身所需的空间外,还占用很小的空间。同时,即使输入包含很多重复,我们的算法也很快:大量实验显示,输入大小最大为110 Mb,表明了这一点。 GNU GPL下提供了我们算法的源代码以及提供简单API的C库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号