首页> 外文期刊>Algorithmica >Improved Approximate String Matching Using Compressed Suffix Data Structures
【24h】

Improved Approximate String Matching Using Compressed Suffix Data Structures

机译:使用压缩后缀数据结构的改进的近似字符串匹配

获取原文
获取原文并翻译 | 示例
       

摘要

Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over an alphabet A, we can preprocess T and give an O(nÖ{logn}log|A|)O(nsqrt{log n}log |A|) -bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(|A|mlog log n+occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(nlog 2 n) bits. The space of our data structure can be further reduced to O(nlog |A|) with the query time increasing by a factor of log ε n, for 0<ε≤1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A| k m k (k+log log n)+occ) and O(log ε n(|A| k m k (k+log log n)+occ)) time using an O(nÖ{logn}log|A|)O(nsqrt{log n}log |A|) -bit and an O(nlog |A|)-bit indexing data structures, respectively. We assume that the alphabet size |A| is bounded by O(2Ö{logn})O(2^{sqrt{log n}}) for the O(nÖ{logn}log|A|)O(nsqrt{log n}log |A|) -bit space data structure.
机译:近似字符串匹配是关于通过允许一定程度的错误在文本中找到给定的字符串模式。在本文中,我们提出了一种空间高效的数据结构来解决1-不匹配和1-差异问题。给定在字母A上长度为n的文本T,我们可以对T进行预处理,并给出O(nÖ{logn} log | A |)O(nsqrt {log n} log | A |)位空间数据结构,以便对于任何长度为m的查询模式P,我们可以在O(| A | mlog log n + occ)时间中找到P的所有1-不匹配(或1-差)出现,其中occ是出现的次数。由于数据结构的空间为o(nlog 2 n)位,因此这是最快的已知查询时间。对于0 <ε≤1,随着查询时间增加log ε n的因数,我们的数据结构空间可以进一步减小为O(nlog | A |)。此外,我们的解决方案可以推广为解决O(| A | k m k (k + log log n的k不匹配(和k差)问题。 )+ occ)和O(log ε n(| A | k m k (k + log log n)+ occ))时间分别使用O(nÖ{logn} log | A |)O(nsqrt {log n} log | A |)位和O(nlog | A |)位索引数据结构。我们假设字母大小| A |由O(nÖ{logn} log | A |)O(nsqrt {log n}的O(2 Ö{logn} )O(2 ^ {sqrt {log n}})界定log | A |)-位空间数据结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号