首页> 外文期刊>Intelligent data analysis >An effective method for approximate representation of frequent itemsets
【24h】

An effective method for approximate representation of frequent itemsets

机译:一种有效表示频繁项集的有效方法

获取原文
获取原文并翻译 | 示例
           

摘要

In data mining, finding frequent itemsets is a critical step to discovering association rules. The number of frequent itemsets may, however, be huge if the threshold of minimum support is set at a low value or the number of items in the transaction database to be mined is large. In the past, some approaches were thus proposed to keep frequent itemsets with compact representation. For example, the approach of maximal itemsets keeps a borderline composed of the maximal itemsets, which separate frequent itemsets from non-frequent ones. It can recover all the frequent itemsets, but cannot get their actual frequencies back. On the contrary, the approach of closed itemsets can correctly recover each frequent itemset and its frequency. Besides, another approach called reference itemsets can recover each frequent itemset and approximately estimate its frequency. In this paper, we propose an efficient algorithm to recover each frequent itemset and its approximate frequency based on the kept maximal itemsets, frequent 1-itemsets, their supports, and some key information. The maximal frequent itemsets are used to recover all frequent itemsets, which are then organized into a simple flow network with levels. Next, the kept key information is used to derive approximate supports of the frequent itemsets in the flow network through the flow process. Finally, a series of experiments are conducted to show the compression effects of the proposed algorithm.
机译:在数据挖掘中,查找频繁项集是发现关联规则的关键步骤。但是,如果将最小支持的阈值设置为较低的值,或者交易数据库中要开采的项目数很大,则频繁项目集的数量可能会很大。过去,因此提出了一些方法来保持频繁项集的紧凑表示。例如,最大项目集的方法保持由最大项目集组成的边界线,该边界线将频繁项目集与非频繁项目集分开。它可以恢复所有频繁的项目集,但不能恢复其实际频率。相反,封闭项目集的方法可以正确地恢复每个频繁项目集及其频率。此外,称为参考项集的另一种方法可以恢复每个频繁项集并大致估计其频率。在本文中,我们基于保留的最大项目集,频繁的1个项目集,它们的支持和一些关键信息,提出了一种有效的算法来恢复每个频繁项目集及其近似频率。最大频繁项目集用于恢复所有频繁项目集,然后将其组织成具有级别的简单流网络。接下来,保留的密钥信息用于通过流程来导出流程网络中频繁项目集的近似支持。最后,进行了一系列实验以展示所提出算法的压缩效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号