【24h】

An Alphabet-Friendly FM-Index

机译:字母友好的fm-index

获取原文

摘要

We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the input alphabet Σ. The size of the new index built on a string T[1, n] is bounded by nH_k(T)+O((n log log n)/ log_(|Σ|) n) bits, where H_k(T) is the k-th order empirical entropy of T. The above bound holds simultaneously for all k ≤ α log_(|Σ|) n and 0 < α < 1. Moreover, the index design does not depend on the parameter k, which plays a role only in analysis of the space occupancy. Using our index, the counting of the occurrences of an arbitrary pattern P[1,p] as a substring of T takes O(p log |Σ|) time. Locating each pattern occurrence takes O(log |Σ| (log~2 n/ log log n)) time. Reporting a text substring of length l takes O((l + log~2 n/ log log n) log |Σ|) time.
机译:我们表明,通过将现有的压缩升压技术与小波树数据结构组合结合,我们能够设计FM索引的变型,其尺寸均匀地缩放为输入字母σ。基于String T [1,n]的新索引的大小由NH_K(T)+ O((n log log n)/ log_(|σ|)n)位界定,其中h_k(t)是K-Th订单T的验证熵同时保持所有k≤αlog_(|σ|)n和0 <α<1。此外,索引设计不依赖于参数k,其起作用仅在分析空间占用时。使用我们的索引,计算任意模式P [1,P]作为T的子字符串的出现需要O(p log |σ|)时间。定位每个模式发生需要O(log |σ|(log〜2 n / log log n))时间。报告长度l的文本子字符串需要o((l + log〜2 n / log log n)log |σ|)时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号