首页> 外文期刊>Journal of supercomputing >ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining
【24h】

ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining

机译:ANG:Apriori和图计算技术的组合,用于频繁项集挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

The Apriori algorithm is one of the most well-known and widely accepted methods for the association rule mining. In Apriori, it uses a prefix tree to represent k-itemsets, generates k-itemset candidates based on the frequent (k-1)-itemsets, and determines the frequent k-itemsets by traversing the prefix tree iteratively based on the transaction records. When k is small, the execution of Apriori is very efficient. However, the execution of Apriori could be very slow when k becomes large because of the deeper recursion depth to determine the frequent k-itemsets. From the perspective of graph computing, the transaction records can be converted to a graph G(V,E), where V is the set of vertices of G that represents the transaction records and E is the set of edges of G that represents the relations among transaction records. Each k-itemset in the transaction records will have a corresponding connected component in G. The number of vertices in the corresponding connected component is the support of the k-itemset. Since the time to find the corresponding connected component of a k-itemset in G is constant for any k, the graph computing method will be very efficient if the number of k-itemsets is relatively small. Based on Apriori and graph computing techniques, a hybrid method, called Apriori and Graph Computing (ANG), is proposed to compute the frequent itemsets. Initially, ANG uses Apriori to compute the frequent k-itemsets and then switches to the graph computing method when k becomes large (where the number of k-itemset candidates is relatively small). The experimental results show that ANG outperforms both Apriori and the graph computing method for all test cases.
机译:Apriori算法是用于关联规则挖掘的最著名和广泛接受的方法之一。在Apriori中,它使用前缀树表示k个项集,根据频繁的(k-1)个项集生成k个项集,并根据交易记录迭代遍历该前缀树来确定频繁的k个项集。当k较小时,Apriori的执行非常有效。但是,当k变大时,Apriori的执行可能会非常慢,这是因为要确定频繁的k项集的递归深度会更深。从图计算的角度来看,可以将交易记录转换为图G(V,E),其中V是代表交易记录的G的顶点集,E是代表关系的G的边集在交易记录中。事务记录中的每个k-itemset将在G中具有一个相应的连接组件。相应的连接组件中的顶点数是k-itemset的支持。由于对于任何k,在G中找到k个项集的对应连接分量的时间都是恒定的,因此,如果k个项集的数量相对较小,则图形计算方法将非常有效。基于Apriori和图计算技术,提出了一种称为Apriori和图计算(ANG)的混合方法来计算频繁项集。最初,ANG使用Apriori来计算频繁的k个项集,然后在k变大(其中k个项集候选的数量相对较少)时切换到图形计算方法。实验结果表明,对于所有测试用例,ANG的性能均优于Apriori和图形计算方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号