...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >TFP: an efficient algorithm for mining top-k frequent closed itemsets
【24h】

TFP: an efficient algorithm for mining top-k frequent closed itemsets

机译:TFP:一种用于挖掘前k个频繁关闭项目集的有效算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min/spl I.bar/support threshold and aim at mining a complete set of frequent itemsets satisfying min/spl I.bar/support. However, in practice, it is difficult for users to provide an appropriate min/spl I.bar/support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min/spl I.bar/l, where k is the desired number of frequent closed itemsets to be mined, and min/spl I.bar/l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins/spl I.bar/support. Starting at min/spl I.bar/support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min/spl I.bar/support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant/spl I.bar/sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size.
机译:频繁项集挖掘已在文献中进行了广泛研究。以前的大多数研究都要求规范min / spl I.bar/support阈值,目的是挖掘满足min / spl I.bar/support的完整频繁项目集。但是,实际上,用户很难提供适当的min / spl I.bar/支持阈值。另外,一套完整的频繁项目集比一组频繁的封闭项目集紧凑得多。在本文中,我们提出了另一种挖掘任务:挖掘长度不小于min / spl I.bar/l的前k个频繁关闭项目集,其中k是要挖掘的期望频繁关闭项目集的数量,min / spl I.bar/l是每个项目集的最小长度。开发了一种称为TFP的高效算法来挖掘此类项目集,而无需使用mins / spl I.bar/support。从min / spl I.bar/support = 0开始,并利用长度限制和top-k频繁关闭项目集的属性,可以有效提高min / spl I.bar/support并修剪FP-Tree使用我们提出的两种方法动态地在树的构建过程中和构建树之后:关闭节点计数和后代/ spl I.bar/sum。此外,通过采用自上而下和自下而上的组合FP-Tree遍历策略,一组搜索空间修剪方法,快速的2级哈希索引结果树以及新颖的封闭项验证方案,进一步加快了挖掘速度。我们广泛的性能研究表明,就数据库大小而言,TFP具有高性能和线性可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号