【24h】

Approximate String Matching with Lempel-Ziv Compressed Indexes

机译:Lempel-Ziv压缩索引的近似字符串匹配

获取原文
获取原文并翻译 | 示例

摘要

A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring of T, thus it actually replaces T. Despite the explosion of interest on self-indexes in recent years, there has not been much progress on search functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in computational biology applications. We present an ASM algorithm that works on top of a Lempel-Ziv self-index. We consider the so-called hybrid indexes, which are the best in practice for this problem. We show that a Lempel-Ziv index can be seen as an extension of the classical q-samples index. We give new insights on this type of index, which can be of independent interest, and then apply them to the Lempel-Ziv index. We show experimentally that our algorithm has a competitive performance and provides a useful space-time tradeoff compared to classical indexes.
机译:文本T的压缩的全文本自我索引是一种数据结构,需要减少空间并能够搜索T中的模式P。此外,该结构可以再现T的任何子字符串,因此实际上替代T。近年来,人们对自索引产生了浓厚的兴趣,除了基本的精确搜索之外,搜索功能还没有取得太大进展。在本文中,我们关注于索引近似字符串匹配(ASM),这在计算生物学应用中引起了极大的兴趣。我们提出了一种可在Lempel-Ziv自索引之上工作的ASM算法。我们考虑所谓的混合索引,它实际上是解决此问题的最佳方法。我们表明,Lempel-Ziv索引可以看作是经典q样本索引的扩展。我们对这种类型的索引提供了新的见解,它们可能具有独立的意义,然后将其应用于Lempel-Ziv索引。我们通过实验表明,与经典索引相比,我们的算法具有竞争优势,并且提供了有用的时空折衷。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号