首页> 外文会议>International Conference on String Processing and Information Retrieval >First Huffman, Then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index
【24h】

First Huffman, Then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index

机译:第一个霍夫曼,然后挖洞轮式:一个简单的字母无关的FM-index

获取原文

摘要

Main Results. The basic string matching problem is to determine the occurrences of a short pattern P = p_1p_2 ... p_m in a large text T = t_1t_2 ... t_n, over an alphabet of size a. Indexes are structures built on the text to speed up searches, but they used to take up much space. In recent years, succinct text indexes have appeared. A prominent example is the FM-index, which takes little space (close to that of the compressed text) and replaces the text, as it can search the text (in optimal O(m) time) and reproduce any text substring without accessing it. The main problem of the FM-index is that its space usage depends exponentially on σ, that is, 5H_kn + σ~σ o(n) for any k, H_k being the k-th order entropy of T. In this paper we present a simple variant of the FM-index, which removes its alphabet dependence. We achieve this by, essentially (but not exactly), Huffman-compressing the text and FM-indexing the binary sequence. Our index needs 2n(H_0 + 1)(1 + o(1)) bits, independent of σ, and it searches in O(m(H_0 + 1)) average time, which can be made O(m log σ) in the worst case. Moreover, our index is considerably simpler to implement than most other succinct indexes.
机译:主要结果。基本字符串匹配问题是在大小文本T = t_1t_2 ... t_n中确定短图案p = p_1p_2 ... p_m的发生。索引是在文本上建立的结构,以加速搜索,但它们用于占用大量空间。近年来,出现了简洁的文本索引。一个突出的例子是fm-index,它需要很少的空间(接近压缩文本的空间)并替换文本,因为它可以搜索文本(在最佳O(m)时间)并在不访问它的情况下重现任何文本子字符串。 FM指数的主要问题是,其空间使用量呈指数上σ,即任何k的5h_kn +σ=σo(n),h_k是T的K-TH命令熵。在本文中,我们存在一个简单的FM索引变体,它消除了其字母依赖性。我们通过基本上(但不完全)来实现这一目标,霍夫曼 - 压缩文本和FM-索引二进制序列。我们的索引需要2n(h_0 + 1)(1 + O(1))位,与σ无关,它在O(m(h_0 + 1))中搜索平均时间,这可以进行O(m logσ)最糟糕的情况。此外,我们的索引比大多数其他简洁索引更简单。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号