首页> 外国专利> Method and system for mining generalized sequential patterns in a large database

Method and system for mining generalized sequential patterns in a large database

机译:在大型数据库中挖掘广义顺序模式的方法和系统

摘要

A method and apparatus are disclosed for mining generalized sequential patterns from a large database of data sequences, taking into account user specified constraints on the time-gap between adjacent elements of the patterns, sliding time-window, and taxonomies over data items. The invention first identifies the items with at least a minimum support, i.e. , those contained in more than a minimum number of data sequences. The items are used as a seed set to generate candidate sequences. Next, the support of the candidate sequences are counted. The invention then identifies those candidate sequences that are frequent, i. e., those with a support above the minimum support. The frequent candidate sequences are entered into the set of sequential patterns, and are used to generate the next group of candidate sequences. Preferably, the candidate sequences are generated by joining previously found frequent candidate sequences, and candidate sequences having a contiguous subsequence without minimum support are discarded. In addition, the invention includes a hash-tree data structure for storing the candidate sequences and memory management techniques for performance improvement.
机译:公开了一种用于从大型数据序列数据库中挖掘广义顺序模式的方法和装置,其中考虑了用户指定的对模式的相邻元素之间的时间间隔的约束,滑动时间窗口以及数据项上的分类法。本发明首先识别具有至少最小支持的项目,即,包含在多于最小数量的数据序列中的那些项目。这些项目用作种子集以生成候选序列。接下来,对候选序列的支持进行计数。然后,本发明识别出那些频繁的候选序列,即。例如,那些拥有高于最低支持的支持。将频繁的候选序列输入到顺序模式的集合中,并用于生成下一组候选序列。优选地,通过结合先前发现的频繁候选序列来生成候选序列,并且丢弃没有最小支持的具有连续子序列的候选序列。另外,本发明包括用于存储候选序列的哈希树数据结构和用于性能改善的存储器管理技术。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号