首页> 外文会议>International Conference on Data Warehousing and Knowledge Discovery >Algorithms for Discovery of Frequent Superset, Rather than Frequent Subset
【24h】

Algorithms for Discovery of Frequent Superset, Rather than Frequent Subset

机译:发现频繁超集的算法,而不是频繁子集

获取原文

摘要

In this paper, we propose a novel mining task: mining frequent superset from the database of itemsets that is useful in bioinformatics, e-learning systems, jobshop scheduling, and so on. A frequent superset means that it contains more transactions than minimum support threshold. Intuitively, according to the Apriori algorithm, the level-wise discovering starts from 1-itemset, 2-itemset, and so forth. However, such steps cannot utilize the property of Apriori to reduce search space, because if an itemset is not frequent, its superset maybe frequent. In order to solve this problem, we propose three methods. The first is the Apriori-based approach, called Apriori-C. The second is Eclat-based approach, called Eclat-C, which is depth-first approach. The last is the proposed data complement technique (DCT) that we utilize original frequent itemset mining approach to mine frequent superset. The experiment study compares the performance of the proposed three methods by considering the effect of the number of transactions, the average length of transactions, the number of different items, and minimum support.
机译:在本文中,我们提出了一种小说挖掘任务:从项目集的数据库中挖掘频繁的超集,这是在生物信息学,电子学习系统,jobshop调度等中。频繁的超集意味着它包含更多的事务而非最小支持阈值。直观地,根据APRIORI算法,级别的发现从1项开始,2项集等开始。但是,这些步骤不能利用APRiori的属性来减少搜索空间,因为如果项目集不频繁,则其超级仪可能频繁。为了解决这个问题,我们提出了三种方法。首先是基于Apriori的方法,称为Apriori-C。第二种是基于Eclat的方法,称为Eclat-C,这是深度第一的方法。最后的是我们利用原始频繁的项目集挖掘方法来实现频繁超级赛的建议的数据补充技术(DCT)。实验研究通过考虑交易数量,交易的平均交易长度,不同项目数和最低支持的效果,比较了提出的三种方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号