An Alphabet-Friendly FM-Index

机译：字母友好的fm-index

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the input alphabet Σ. The size of the new index built on a string T[1, n] is bounded by nH_k(T)+O((n log log n)/ log_(|Σ|) n) bits, where H_k(T) is the k-th order empirical entropy of T. The above bound holds simultaneously for all k ≤ α log_(|Σ|) n and 0 < α < 1. Moreover, the index design does not depend on the parameter k, which plays a role only in analysis of the space occupancy. Using our index, the counting of the occurrences of an arbitrary pattern P[1,p] as a substring of T takes O(p log |Σ|) time. Locating each pattern occurrence takes O(log |Σ| (log~2 n/ log log n)) time. Reporting a text substring of length l takes O((l + log~2 n/ log log n) log |Σ|) time.

机译：我们表明，通过将现有的压缩升压技术与小波树数据结构组合结合，我们能够设计FM索引的变型，其尺寸均匀地缩放为输入字母σ。基于String T [1，n]的新索引的大小由NH_K（T）+ O（（n log log n）/ log_（|σ|）n）位界定，其中h_k（t）是K-Th订单T的验证熵同时保持所有k≤αlog_（|σ|）n和0 <α<1。此外，索引设计不依赖于参数k，其起作用仅在分析空间占用时。使用我们的索引，计算任意模式P [1，P]作为T的子字符串的出现需要O（p log |σ|）时间。定位每个模式发生需要O（log |σ|（log〜2 n / log log n））时间。报告长度l的文本子字符串需要o（（l + log〜2 n / log log n）log |σ|）时间。

著录项

来源
《International Conference on String Processing and Information Retrieval》|2004年||共11页
会议地点
作者
Paolo Ferragina; Giovanni Manzini; Veli Maekinen; Gonzalo Navarro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类数据备份与恢复;
关键词

相似文献

外文文献
中文文献
专利

1. Secure Wavelet Matrix: Alphabet-Friendly Privacy-Preserving String Search for Bioinformatics [J] . Sudo Hiroki, Jimbo Masanobu, Nuida Koji, IEEE/ACM transactions on computational biology and bioinformatics . 2019,第5期

机译：安全小波矩阵：字母友好的隐私保护字符串搜索生物信息学
2. Simpler FM-index for parameterized string matching [J] . Kim Sung-Hwan, Cho Hwan-Gue Information Processing Letters . 2021,第Jana期

机译：用于参数化字符串匹配的更简单的FM-index
3. Enabling fast and energy-efficient FM-index exact matching using processing-near-memory [J] . Herruzo Jose M., Fernandez Ivan, Gonzalez-Navarro Sonia, Journal of supercomputing . 2021,第9期

机译：通过处理近存储器启用快速和节能的FM-Index精确匹配
4. An Alphabet-Friendly FM-Index [C] . Paolo Ferragina, Giovanni Manzini, Veli Maekinen, International Conference on String Processing and Information Retrieval(SPIRE 2004); 20041005-08; Padova(IT) . 2004

机译：字母友好的FM索引
5. Hardware Implementation of a String Matching Algorithm Based on the FM-Index [D] . Fernandez, Edward Bryann Cabanayan 2013

机译：基于FM-Index的字符串匹配算法的硬件实现
6. FMLRC: Hybrid long read error correction using an FM-index [O] . Jeremy R. Wang, James Holt, Leonard McMillan, 2018

机译：FMLRC：使用FM索引的混合式长读错误校正
7. An alphabet-friendly FM-index [O] . Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, 2004

机译：字母友好的FM索引

An Alphabet-Friendly FM-Index

摘要

著录项

相似文献

相关主题

期刊订阅