首页> 外文期刊>Data mining and knowledge discovery >Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
【24h】

Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

机译:没有候选生成的频繁模式:频繁模式树方法

获取原文
获取原文并翻译 | 示例
           

摘要

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation in still costly, especially when there exist a large number of patterns and/or long patterns. In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-treebased mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods.
机译:在数据挖掘研究中普遍地研究了交易数据库中的频繁模式,时间序列数据库和许多其他类型的数据库。以前的大多数研究采用了一种类似的候选集合生成和测试方法。然而,候选集合仍然昂贵,特别是当存在大量模式和/或长图案时。在这项研究中,我们提出了一种新的频繁模式树(FP-Tree)结构,它是一种用于存储有关频繁模式的压缩的扩展前缀结构的扩展前缀结构,并开发出高效的FP-TreeBased采矿方法FP-Grower ,通过模式片段生长挖掘完整的频繁模式。采用挖掘效率采用三种技术实现:(1)大型数据库被压缩成浓缩,较小的数据结构,FP树,避免昂贵,重复的数据库扫描,(2)我们的FP-Tree的矿业采用了模式 - 片段生长方法以避免昂贵的候选集的成本生成,并且(3)基于分区的,划分的,划分方法用于将挖掘任务分解为一组较小的任务,以便在条件下挖掘限制模式数据库,大大减少了搜索空间。我们的绩效研究表明,FP-Grange方法对于挖掘长短频繁的模式,FP-Growce方法是高效且可扩展的,并且大约比Apriori算法快,而且比最近报告的新常规模式采矿方法更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号