...
首页> 外文期刊>ACM transactions on algorithms >Fully compressed suffix trees
【24h】

Fully compressed suffix trees

机译:完全压缩的后缀树

获取原文
获取原文并翻译 | 示例

摘要

Suffix trees are by far themost important data structure in stringology, with a myriad of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require (nlog n) bits of space, for a string of size n. This is considerably more than the nlog _2 σ bits needed for the string itself, where σ is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus (n) extra bits. This is already spectacular, but the linear extra bits are still unsatisfactory when σ is small as in DNA sequences. In this article, we introduce the first compressed suffix tree representation that breaks this (n)-bit space barrier. The Fully Compressed Suffix Tree (FCST) representation requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time. This includes extracting arbitrary text substrings, so the FCST replaces the text using almost the same space as the compressed text. An essential ingredient of FCSTs is the lowest common ancestor (LCA) operation. We reveal important connections between LCAs and suffix tree navigation. We also describe how to make FCSTs dynamic, that is, support updates to the text. The dynamic FCST also supports several operations. In particular, it can build the static FCST within optimal space and polylogarithmic time per symbol. Our theoretical results are also validated experimentally, showing that FCSTs are very effective in practice as well.
机译:后缀树是迄今为止字符串学中最重要的数据结构,在生物信息学和信息检索等领域中有无数的应用。对于大小为n的字符串,后缀树的经典表示形式需要(nlog n)位空间。这远远大于字符串本身所需的nlog _2σ位,其中σ是字母大小。后缀树的大小一直是其在实践中被广泛采用的障碍。最近的压缩后缀树表示形式只需要压缩字符串的空间加上(n)个额外位。这已经很壮观了,但是当σ小于DNA序列时,线性额外位仍然不能令人满意。在本文中,我们介绍了第一个压缩后缀树表示形式,该表示形式打破了此(n)位空间障碍。完全压缩后缀树(FCST)表示仅需要压缩文本大小顶部的亚线性空间,并且几乎在对数时间内支持大量导航操作。这包括提取任意文本子字符串,因此FCST使用与压缩文本几乎相同的空间替换文本。 FCST的必要组成部分是最低的祖先(LCA)操作。我们揭示了LCA与后缀树导航之间的重要联系。我们还将介绍如何使FCST动态化,即支持文本更新。动态FCST还支持多种操作。特别是,它可以在每个符号的最佳空间和多对数时间内构建静态FCST。我们的理论结果也通过实验得到验证,表明FCST在实践中也非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号