首页> 外文期刊>Algorithmica >Efficient Online String Matching Based on Characters Distance Text Sampling
【24h】

Efficient Online String Matching Based on Characters Distance Text Sampling

机译:基于字符距离文本采样的高效在线字符串匹配

获取原文
获取原文并翻译 | 示例

摘要

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology.Sampled string matchingis an efficient approach recently introduced in order to overcome the prohibitive space requirements of an index construction, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. In this paper we present a new algorithm for the sampled string matching problem, based on a characters distance sampling approach. The main idea is to sample the distances between consecutive occurrences of a givenpivotcharacter and then to search online the sampled data for any occurrence of the sampled pattern, before verifying the original text. From a theoretical point of view we prove that, under suitable conditions, our solution can achieve both linear worst-case time complexity and optimal average-time complexity. From a practical point of view it turns out that our solution shows a sub-linear behaviour in practice and speeds up online searching by a factor of up to 9, using limited additional space whose amount goes from 11 to 2.8% of the text size, with a gain up to 50% if compared with previous solutions.
机译:在文本中搜索所有模式的模式是计算机科学中的一个基本问题,其中许多其他字段中的应用程序,如自然语言处理,信息检索和计算生物学。采样的字符串匹配最近引入的有效方法以克服禁止空间另一方面,一方面,索引结构的要求,并急剧减少在线解决方案的搜索时间。在本文中,我们基于字符距离采样方法为采样字符串匹配问题提出了一种新的算法。主要思想是在验证原始文本之前对给定节能特征的连续出现的连续发生之间的距离进行示例,然后在在线中搜索采样数据的采样数据。从理论的角度来看,我们证明,在合适的条件下,我们的解决方案可以实现线性最坏情况的时间复杂性和最佳的平均时间复杂性。从实际的角度来看,我们的解决方案在实践中显示了子线性行为,并使用有限的附加空间在线搜索到最多9倍,其数量从11到2.8%的文本大小的数量从11%到2.8%。与以前的解决方案相比,增益高达50%。

著录项

  • 来源
    《Algorithmica》 |2020年第11期|3390-3412|共23页
  • 作者单位

    Univ Catania Dipartimento Matemat & Informat Viale A Doria 6 I-95125 Catania Italy;

    Univ Catania Dipartimento Matemat & Informat Viale A Doria 6 I-95125 Catania Italy;

    Univ Messina Dipartimento Sci Cognit Via Concez 6 I-98122 Messina Italy;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    String matching; Text processing; Efficient searching; Text indexing;

    机译:字符串匹配;文本处理;高效搜索;文本索引;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号