First Huffman, Then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index

机译：第一个霍夫曼，然后挖洞轮式：一个简单的字母无关的FM-index

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Main Results. The basic string matching problem is to determine the occurrences of a short pattern P = p_1p_2 ... p_m in a large text T = t_1t_2 ... t_n, over an alphabet of size a. Indexes are structures built on the text to speed up searches, but they used to take up much space. In recent years, succinct text indexes have appeared. A prominent example is the FM-index, which takes little space (close to that of the compressed text) and replaces the text, as it can search the text (in optimal O(m) time) and reproduce any text substring without accessing it. The main problem of the FM-index is that its space usage depends exponentially on σ, that is, 5H_kn + σ~σ o(n) for any k, H_k being the k-th order entropy of T. In this paper we present a simple variant of the FM-index, which removes its alphabet dependence. We achieve this by, essentially (but not exactly), Huffman-compressing the text and FM-indexing the binary sequence. Our index needs 2n(H_0 + 1)(1 + o(1)) bits, independent of σ, and it searches in O(m(H_0 + 1)) average time, which can be made O(m log σ) in the worst case. Moreover, our index is considerably simpler to implement than most other succinct indexes.

机译：主要结果。基本字符串匹配问题是在大小文本T = t_1t_2 ... t_n中确定短图案p = p_1p_2 ... p_m的发生。索引是在文本上建立的结构，以加速搜索，但它们用于占用大量空间。近年来，出现了简洁的文本索引。一个突出的例子是fm-index，它需要很少的空间（接近压缩文本的空间）并替换文本，因为它可以搜索文本（在最佳O（m）时间）并在不访问它的情况下重现任何文本子字符串。 FM指数的主要问题是，其空间使用量呈指数上σ，即任何k的5h_kn +σ=σo（n），h_k是T的K-TH命令熵。在本文中，我们存在一个简单的FM索引变体，它消除了其字母依赖性。我们通过基本上（但不完全）来实现这一目标，霍夫曼 - 压缩文本和FM-索引二进制序列。我们的索引需要2n（h_0 + 1）（1 + O（1））位，与σ无关，它在O（m（h_0 + 1））中搜索平均时间，这可以进行O（m logσ）最糟糕的情况。此外，我们的索引比大多数其他简洁索引更简单。

著录项

来源
《International Conference on String Processing and Information Retrieval》|2004年||共2页
会议地点
作者
Szymon Grabowski; Veli Maekinen; Gonzalo Navarro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类数据备份与恢复;
关键词

相似文献

外文文献
中文文献
专利

1. A SIMPLE ALPHABET-INDEPENDENT FM-INDEX [J] . SZYMON GRABOWSKI, GONZALO NAVARRO, RAFAL PRZYWARSKI, International Journal of Foundations of Computer Science . 2006,第6期

机译：简单的独立于字母FM的索引
2. Simpler FM-index for parameterized string matching [J] . Kim Sung-Hwan, Cho Hwan-Gue Information Processing Letters . 2021,第Jana期

机译：用于参数化字符串匹配的更简单的FM-index
3. An improvement of dynamic Huffman coding with a simple repetition finder [J] . Yokoo H. IEEE Transactions on Communications . 1991,第1期

机译：简单重复查找器对动态霍夫曼编码的改进
4. First Huffman, Then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index [C] . Szymon Grabowski, Veli Maekinen, Gonzalo Navarro International Conference on String Processing and Information Retrieval(SPIRE 2004); 20041005-08; Padova(IT) . 2004

机译：首先是霍夫曼，然后是Burrows-Wheeler：一个简单的与字母无关的FM索引
5. Hardware Implementation of a String Matching Algorithm Based on the FM-Index [D] . Fernandez, Edward Bryann Cabanayan 2013

机译：基于FM-Index的字符串匹配算法的硬件实现
6. FMLRC: Hybrid long read error correction using an FM-index [O] . Jeremy R. Wang, James Holt, Leonard McMillan, 2018

机译：FMLRC：使用FM索引的混合式长读错误校正
7. First Huffman, then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index [O] . Szymon Grabowski, Veli Mäkinen, Gonzalo Navarro 2009

机译：第一个霍夫曼，然后是Burrows-Wheeler：一个简单的字母独立的Fm指数

First Huffman, Then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index

摘要

著录项

相似文献

相关主题

期刊订阅