首页> 外文会议>IEEE International Conference on computer supported cooperative work in design >A distributed frequent itemset mining algorithm based on Spark
【24h】

A distributed frequent itemset mining algorithm based on Spark

机译:基于Spark的分布式频繁项集挖掘算法

获取原文

摘要

Frequent itemset mining is an important step of association rules mining. Traditional frequent itemset mining algorithms have certain limitations. For example Apriori algorithm has to scan the input data repeatedly, which leads to high I/O load and low performance, and the FP-Growth algorithm is limited by the capacity of computer's inner stores because it needs to build a FP-tree and mine frequent itemset on the basis of the FP-tree in memory. With the coming of the Big Data era, these limitations are becoming more prominent when confronted with mining large-scale data. In this paper, DPBM, a distributed matrix-based pruning algorithm based on Spark, is proposed to deal with frequent itemset mining. DPBM can greatly reduce the amount of candidate itemset by introducing a novel pruning technique for matrix-based frequent itemset mining algorithm, an improved Apriori algorithm which only needs to scan the input data once. In addition, each computer node reduces greatly the memory usage by implementing DPBM under a latest distributed environment-Spark, which is a lightning-fast distributed computing. The experimental results show that DPBM have better performance than MapReduce-based algorithms for frequent itemset mining in terms of speed and scalability.
机译:频繁项集挖掘是关联规则挖掘的重要步骤。传统的频繁项集挖掘算法具有一定的局限性。例如,Apriori算法必须反复扫描输入数据,这会导致高I / O负载和低性能,而FP-Growth算法受计算机内部存储容量的限制,因为它需要构建FP-tree和挖矿基于内存中的FP树的频繁项集。随着大数据时代的到来,这些限制在面对挖掘大型数据时变得越来越突出。本文提出了一种基于Spark的分布式基于矩阵的修剪算法DPBM,用于处理频繁项集挖掘。 DPBM通过为基于矩阵的频繁项集挖掘算法引入一种新颖的修剪技术,可以大大减少候选项集的数量,这是一种改进的Apriori算法,只需要扫描一次输入数据即可。此外,每个计算机节点都通过在最新的分布式环境-Spark(闪电般的分布式计算)下实施DPBM大大减少了内存使用。实验结果表明,对于频繁项集挖掘,DPBM在速度和可伸缩性方面比基于MapReduce的算法具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号