首页> 外文期刊>Data Mining and Knowledge Discovery >A space efficient solution to the frequent string mining problem for many databases
【24h】

A space efficient solution to the frequent string mining problem for many databases

机译:一种空间高效的解决方案,用于解决许多数据库的频繁字符串挖掘问题

获取原文
获取原文并翻译 | 示例
           

摘要

The frequent string mining problem is to find all substrings of a collection of string databases which satisfy database specific minimum and maximum frequency constraints. Our contribution improves the existing linear-time algorithm for this problem in such a way that the peak memory consumption is a constant factor of the size of the largest database of strings. We show how the results for each database can be stored implicitly in space proportional to the size of the database, making it possible to traverse the results in lexicographical order. Furthermore, we present a linear-time algorithm which calculates the intersection of the results of different databases. This algorithm is based on an algorithm to merge two suffix arrays, and our modification allows us to also calculate the LCP table of the resulting suffix array during the merging.
机译:频繁的字符串挖掘问题是找到满足数据库特定的最小和最大频率约束的字符串数据库集合的所有子字符串。我们的贡献以这种方式改进了针对此问题的现有线性时间算法,使得峰值内存消耗是最大字符串数据库的大小的常数。我们展示了如何将每个数据库的结果隐式存储在与数据库大小成比例的空间中,从而可以按字典顺序遍历结果。此外,我们提出了一种线性时间算法,该算法可计算不同数据库的结果的交集。该算法基于合并两个后缀数组的算法,我们的修改使我们也可以在合并过程中计算所得后缀数组的LCP表。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号