【24h】

Minimum Description Length Principle: Generators are Preferable to Closed Patterns

机译:最小描述长度原则:生成器优先于闭合模式

获取原文

摘要

The generators and the unique closed pattern of an equivalence class of itemsets share a common set of transactions. The generators are the minimal ones among the equivalent itemsets, while the closed pattern is the maximum one. As a generator is usually smaller than the closed pattern in cardinality, by the Minimum Description Length Principle, the generator is preferable to the closed pattern in inductive inference and classification. To efficiently discover frequent generators from a large dataset, we develop a depth-first algorithm called Gr-growth. The idea is novel in contrast to traditional breadth-first bottom-up generator-mining algorithms. Our extensive performance study shows that Gr-growth is significantly faster (an order or even two orders of magnitudes when the support thresholds are low) than the existing generator mining algorithms. It can be also faster than the state-of-the-art frequent closed itemset mining algorithms such as FPclose and CLOSET+.
机译:生成器和等价类项集的唯一闭合模式共享一组公共事务。生成器是等效项集中的最小生成器,而封闭模式是最大项。由于生成器通常在基数上小于封闭模式,因此根据最小描述长度原理,在归纳推理和分类中,生成器优于封闭模式。为了有效地从大型数据集中发现频繁的生成器,我们开发了一种称为Gr-growth的深度优先算法。与传统的广度优先的自下而上的生成器挖掘算法相比,该想法是新颖的。我们广泛的性能研究表明,Gr-growth比现有的发电机挖掘算法快得多(当支持阈值低时,一个数量级甚至两个数量级)。它也可以比最新的频繁关闭项目集挖掘算法(例如FPclose和CLOSET +)更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号