...
【24h】

An Encoding for Order-Preserving Matching

机译:保留订单匹配的编码

获取原文
           

摘要

Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: e.g., (4, 1, 3, 2) and (10, 3, 7, 5) are an order-preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size sigma and a constant c >=1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m >= log^c n, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order-preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if log(sigma) = Omega(log log n); our query time is optimal if log(sigma) = Omega(log n). Our space bound contrasts with the Omega(n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(log^c n) neighbouring characters.
机译:编码数据结构存储了足够的信息来回答它们要支持的查询,但还不足以恢复其基础数据集。在本文中,我们给出了第一个编码数据结构,以解决保序模式匹配的难题。该问题仅在几年前才引入,但由于其在数据分析中的应用而引起了极大的关注。如果两个字符串的字符的相对顺序相同,则称它们为保留顺序的匹配项:例如(4、1、3、2)和(10、3、7、5)是保留顺序的匹配项。我们展示了如何在给定大小sigma的任意字母上的字符串S [1..n]和常数c> = 1的情况下,如何构建O(n log log n)位编码,从而稍后再给定模式P [1..m],其中m> = log ^ cn,我们可以返回在O(m)时间内S中P的保序出现次数。在同一时限内,我们还可以返回S中P的某些保留顺序匹配的开始位置(如果存在这样的匹配)。我们证明,如果log(sigma)= Omega(log log n);则我们的空间界限在最佳常数因子之内;如果log(sigma)= Omega(log n),我们的查询时间是最佳的。我们的空间限制与最坏情况下存储S本身所需的Omega(n log n)位,用于保留顺序的模式匹配索引(对模式长度没有限制)或用于标准模式匹配的索引(即使存在限制)形成对比图案长度。此外,我们可以仅知道每个字符与O(log ^ c n)相邻字符的比较来构建编码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号