首页> 外文期刊>Computing and informatics >An Efficient Itemset Representation for Mining Frequent Patterns in Transactional Databases
【24h】

An Efficient Itemset Representation for Mining Frequent Patterns in Transactional Databases

机译:事务数据库中频繁模式挖掘的高效项集表示

获取原文
           

摘要

In this paper we propose very efficient itemset representation for frequent itemset mining from transactional databases. The combinatorial number system is used to uniquely represent frequent k-itemset with just one integer value, for any k ≥ 2. Experiments show that memory requirements can be reduced up to 300 %, especially for very low minimal support thresholds. Further, we exploit combinatorial number schema for representing candidate itemsets during iterative join-based approach. The novel algorithm maintains one-dimensional array rank, starting from k = 2nd iteration. At the index r of the array, the proposed algorithm stores unique integer representation of the r-th candidate in lexicographic order. The rank array provides joining of two candidate k-itemsets to be O(1) instead of O(k) operation. Additionally, the rank array provides faster determination which candidates are contained in the given transaction during the support count and test phase. Finally, we believe that itemset ranking by combinatorial number system can be effectively integrated into pattern-growth algorithms, that are state-of-the-art in frequent itemset mining, and additionally improve their performances.
机译:在本文中,我们为交易数据库中频繁的项目集挖掘提出了非常有效的项目集表示。对于k≥2的任何k,使用组合数字系统唯一地表示只有一个整数的频繁k-项集。实验表明,内存需求最多可以减少300%,尤其是对于最低支持阈值非常低的情况。此外,在基于迭代联接的方法中,我们利用组合数字模式来表示候选项目集。从k =第2次迭代开始,新算法可维持一维数组秩。在数组的索引r处,所提出的算法按字典顺序存储第r个候选的唯一整数表示。等级数组提供两个候选k个项集的连接为O(1)而不是O(k)操作。此外,等级数组可以更快地确定在支持计数和测试阶段给定事务中包含哪些候选对象。最后,我们相信通过组合数字系统对项目集进行排序可以有效地集成到模式增长算法中,该算法是频繁项目集挖掘中的最新技术,并且可以改善其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号