首页> 外文期刊>Algorithmica >Linearized Suffix Tree: An Efficient Index Data Structure With The Capabilities Of Suffix Trees And Suffix Arrays
【24h】

Linearized Suffix Tree: An Efficient Index Data Structure With The Capabilities Of Suffix Trees And Suffix Arrays

机译:线性化后缀树:具有后缀树和后缀数组功能的高效索引数据结构

获取原文
获取原文并翻译 | 示例
       

摘要

Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing.Since suffix trees and suffix arrays have different capabilities,some problems are solved more efficiently using suffix trees and others are solved more efficiently using suffix arrays.We consider efficient index data structures with the capabilities of both suffix trees and suffix arrays without requiring much space.When the size of an alphabet is small,enhanced suffix arrays are such index data structures.However,when the size of an alphabet is large,enhanced suffix arrays lose the power of suffix trees.Pattern searching in an enhanced suffix array takes O(m|Σ|) time while pattern searching in a suffix tree takes O(m log |Σ|) time where m is the length of a pattern and Σ is an alphabet. In this paper,we present linearized suffix trees which are efficient index data structures with the capabilities of both suffix trees and suffix arrays even when the size of an alphabet is large.A linearized suffix tree has all the functionalities of the enhanced suffix array and supports the pattern search in O(m log |Σ|) time.In a different point of view,it can be considered a practical implementation of the suffix tree supporting O(in log |Σ|)-time pattern search. In addition,we also present two efficient algorithms for computing suffix links on the enhanced suffix array and the linearized suffix tree.These are the first algorithms that run in O(n) time without using the range minima query.Our experimental results show that our algorithms are faster than the previous algorithms.
机译:后缀树和后缀数组是解决字符串处理中出现的问题的基本全文索引数据结构。由于后缀树和后缀数组具有不同的功能,因此使用后缀树可以更有效地解决某些问题,而使用后缀数组可以更有效地解决一些问题。我们认为高效的索引数据结构具有后缀树和后缀数组的功能,并且不需要太多空间。当字母的大小较小时,增强的后缀数组就是这样的索引数据结构。但是,当字母的大小较大时,增强的后缀数组失去后缀树的功能。在增强的后缀数组中进行模式搜索需要O(m |Σ|)时间,而在后缀树中进行模式搜索则需要O(m log |Σ|)时间,其中m是a的长度。模式和Σ是字母。本文提出了线性化后缀树,它是高效的索引数据结构,即使在字母表大小较大的情况下也具有后缀树和后缀数组的功能。线性化后缀树具有增强后缀数组的所有功能并支持从另一角度来看,可以认为后缀树是支持O(log |Σ|)时间模式搜索的实际实现。此外,我们还提出了两种有效的算法来计算增强后缀数组和线性化后缀树上的后缀链接。这是在O(n)时间内运行的第一个不使用范围最小值查询的算法。我们的实验结果表明,算法比以前的算法快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号