【24h】

A Compressed Self-index Using a Ziv-Lempel Dictionary

机译:使用Ziv-Lempel字典的压缩自索引

获取原文
获取原文并翻译 | 示例

摘要

A compressed full-text self-index for a text T, of size u, is a data structure used to search patterns P, of size m, in T that requires reduced space, i.e. that depends on the empirical entropy (H_κ, H_0) of T, and is, furthermore, able to reproduce any substring of T. In this paper we present a new compressed self-index able to locate the occurrences of P in O((m + occ) log n) time, where occ is the number of occurrences and σ the size of the alphabet of T. The fundamental improvement over previous LZ78 based indexes is the reduction of the search time dependency on m from O(m~2) to O(m). To achieve this result we point out the main obstacle to linear time algorithms based on LZ78 data compression and expose and explore the nature of a recurrent structure in LZ-indexes, the T_(78) suffix tree. We show that our method is very competitive in practice by comparing it against the LZ-Index, the FM-index and a compressed suffix array.
机译:大小为u的文本T的压缩的全文本索引是一种数据结构,用于在T中搜索大小为m的模式P,该模式需要减小的空间,即取决于经验熵(H_κ,H_0) T,并且能够重现T的任何子串。在本文中,我们提出了一个新的压缩自索引,它能够在O((m + occ)log n)时间中定位P的出现,其中occ是T的出现次数和σ的大小。相对于以前的基于LZ78的索引,根本的改进是将对m的搜索时间依赖性从O(m〜2)减少到O(m)。为了获得此结果,我们指出了基于LZ78数据压缩的线性时间算法的主要障碍,并揭示和探索了LZ索引的后缀结构T_(78)后缀树的性质。通过将其与LZ索引,FM索引和压缩后缀数组进行比较,我们证明了该方法在实践中非常有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号