首页> 中文期刊> 《计算机技术与发展》 >一种基于SFP树的快速关联规则挖掘算法

一种基于SFP树的快速关联规则挖掘算法

         

摘要

对于传统的FP-Growth算法而言,当事务数据库D很大时,构造基于内存的FP树可能是不现实的.针对此问题,提出了一种基于样本事务数据库的SFP算法.该方法对事务数据库D进行随机抽样,得到样本数据库S,此时以比指定的支持度min_sup小的支持度(min_sup')在S中挖掘频繁项集L',根据求得的频繁项集L',在剩余的数据库D-S中求得L'中各事务的支持数,这在大多数情况下就可以求得所有的频繁项集,但是有时可能会漏掉一些.这时可以对D进行二次扫描以发现漏掉的频繁项集.该算法大多数情况下只需要对数据库进行一次扫描,最坏情况下也只需要对数据库进行二次扫描.当把效率放在首位时,比如计算密集事务数据库的频繁项集时,SFP算法尤其合适.%For the traditional algorithm of FP growth, when database is very big, the structure of FP tree based on memory, sometimes is not realistic. According to this problem, put forward a SFP algorithm based on the sample of the transaction database. This method gets sample transaction database S from database D through random sampling, firstly in less than a specified min_sup support, digging frequent itemsets L from S, then calculating L each element of the support in the rest of the data sets D-S, finally getting L by min_sup.In most cases it can he obtained all the frequent itemsets, but sometimes no. In this time omit the frequent itemsets a second scan. The algorithm needs only one scan of database in most cases, the worst case scenario also needs only second scan to database. When the efficiency is the most important, such as intensive application must be frequently calculated, SFP algorithm is the first selecting.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号