首页> 外文会议>International Conference on String Processing and Information Retrieval >An Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays for Alphabets of Non-negligible Size
【24h】

An Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays for Alphabets of Non-negligible Size

机译:具有后缀树和后缀阵列的有效索引数据结构,用于非可忽略的尺寸的字母表

获取原文

摘要

The suffix tree and the suffix array are fundamental full-text index data structures and many algorithms have been developed on them to solve problems occurring in string processing and information retrieval. Some problems are solved more efficiently using the suffix tree and others are solved more efficiently using the suffix array. We consider the index data structure with the capabilities of both the suffix tree and the suffix array without requiring much space. For the alphabets whose size is negligible, Abouelhoda et al. developed the enhance suffix axray for this purpose. It consists of the suffix array and the child table. The child table stores the parent-child relationship between the nodes in the suffix tree so that every algorithm developed on the suffix tree can be run with a small and systematic modification. Since the child table consumes moderate space and is constructed very fast, the enhanced suffix array is almost as time/space-efficient as the suffix axray. However, when the size of the alphabet is not negligible, the enhance suffix array loses the capabilities of the suffix tree. The pattern search in the enhanced suffix array takes O(m |Σ|) time where m is the length of the pattern and Σ is the alphabet, while the pattern search in the suffix tree takes O(m log |Σ|) time. In this paper, we improve the enhanced suffix array to have the capabilities of the suffix tree and the suffix array even when the size of the alphabet is not negligible. We do this by presenting a new child table, which improves the enhanced suffix array to support the pattern search in O(m log |Σ|) time. Our index data structure is almost as time/space-efficient as the enhanced suffix array. It consumes the same space as the enhanced suffix array and its construction time is slightly slower (< 4%) than that of the enhanced suffix array. In a different point of view, it can be considered the first practical one facilitating the capabilities of suffix trees when the size of the alphabet is not negligible because the suffix tree supporting O(m log |Σ|)-time pattern search is not easy to implement and thus it is rarely used in practice.
机译:后缀树和后缀数组是基本的全文索引数据结构,并且已经开发了许多算法,以解决字符串处理和信息检索中发生的问题。使用后缀树更有效地解决了一些问题,并且其他问题使用后缀阵列更有效地解决。我们考虑具有后缀树和后缀数组的功能的索引数据结构,而无需大量空间。对于尺寸可以忽略不计的字母,abouelhoda等。为此开发了增强后缀Axray。它由后缀数组和子表组成。子表存储后缀树中的节点之间的父子关系,以便在后缀树上开发的每个算法都可以使用小而系统的修改来运行。由于子表消耗了适度的空间并且构造得非常快,因此增强的后缀阵列几乎与后缀Axray一样时间/空间效率。但是,当字母表的大小不可忽略时,增强后缀阵列丢失后缀树的能力。增强型后缀数组中的模式搜索需要m是m是图案的长度的O(m |σ|),而σ是字母表,而后缀树中的模式搜索需要O(m log |σ|)时间。在本文中,即使字母表的大小不可忽略,我们也会改进增强的后缀阵列以具有后缀树和后缀数组的功能。我们通过呈现一个新的子表来完成此操作,这改善了增强的后缀数组来支持O(m log |σ|)时间的模式搜索。我们的索引数据结构几乎是时间/空间效率为增强的后缀数组。它消耗了与增强型后缀阵列相同的空间,其施工时间略微慢(<4%),而不是增强的后缀阵列。在不同的角度来看,它可以被认为是当字母表的大小不可忽略时,因为支持O(m log |σ|) - time模式搜索的后缀树不可忽略时,它可以被认为是促进后缀树的能力。不容易实施,因此它很少在实践中使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号