首页> 外文期刊>Journal of experimental algorithmics >Mining Frequent Generalized Itemsets And Generalized Association Rules Without Redundancy
【24h】

Mining Frequent Generalized Itemsets And Generalized Association Rules Without Redundancy

机译:无需冗余即可挖掘频繁的广义项目集和广义关联规则

获取原文
获取原文并翻译 | 示例

摘要

This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified.rnOur algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four.rnIn each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classification-based algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR-dass for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.
机译:本文提出了一些新算法,可以有效地挖掘最大频繁的广义项目集(g-项目集)和基本的广义关联规则(g-规则)。这些是广义环境中所有常见模式和所有强关联规则的紧凑且通用的表示形式。通过结合两个概念,我们的结果填补了频繁模式和关联规则算法之间的重要空白。首先,广义项集采用项的分类法,而不是项的统一列表。这样会产生更自然的频繁项目集和关联,例如(肉,牛奶)而不是(牛肉,牛奶),(鸡肉,牛奶)等。第二,频繁项目集的紧凑表示形式和强大的规则(其结果大小成倍减小)可以解决挖掘模式中的标准难题:由于支持和置信度的阈值较小,用户被大量识别出的模式和关联所淹没;但是,由于阈值较大,因此无法识别出一些有趣的模式和关联。我们的算法还可以将那些最大的频繁g项集和基本g规则扩展为更大的一组普通频繁g项集和强g规则。尽管在大多数实际情况下不建议进行扩展,但我们这样做是为了与仅处理普通的频繁g-项集的现有算法进行比较。在这种情况下,新算法的速度比以前的算法快了数千倍,有时甚至达到了数百万倍。此外,新算法成功地分析了七个或更多个深度的更深分类法。先前算法的实验结果仅限于深度不超过3或4的分类法。在两个问题的每一个中,简要讨论了一种简单的基于格的方法,然后开发了一种基于分类的算法。特别是,这两种基于分类的算法分别是:MFGI_class(用于挖掘最大频繁g项集)和EGR-dass(用于挖掘基本g规则)。基于分类的算法具有概念分类树以及动态生成和修剪算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号