首页> 中文期刊>计算机应用研究 >IABS:一个基于Spark的Apriori改进算法

IABS:一个基于Spark的Apriori改进算法

     

摘要

Apriori algorithm is one of the most classical algorithm in association rule mining, the core problem is the generation process of frequent itemsets.Firstly, aimed at the existing problems of classical Apriori algorithm, such as it needed to scan the transaction global database for several times and needed to generate candidate itemsets, this paper optimized it by transforming storage structure and eliminating the process of candidate itemsets generation.Then, with the advent of the era of big data, data volume rises with the day, classical Apriori algorithm faces severe challenge.Based on the improved Apriori algorithm and combined with Spark platform, this paper proposed the IABS algorithm, which made full use of Spark, such as in-memory computation, resilient distributed datasets.Compared with already existing similar algorithms, the sizeup and node salability of IABS are validated, as well as, IABS achieves 23.88% performance improvement in average for various benchmarks.Especially, as the growth of data, its performance improvement is more obvious.%Apriori算法是关联规则挖掘中最经典的算法之一,其核心问题是频繁项集的获取.针对经典Apriori算法存在的需多次遍历事务数据库及需产生候选项集等问题,首先通过转换存储结构、消除候选集产生过程等方法对Apriori算法进行优化;同时,随着大数据时代的到来,数据量与日俱增,传统算法面临巨大挑战,将优化的Apriori与Spark相结合,充分利用Spark的内存计算、弹性分布式数据集等优势,提出了IABS(improved Apriori algorithm based on Spark).通过与已有的同类算法进行比较,IABS的数据可扩展性和节点可扩展性得以验证,并且在多种数据集上平均获得了23.88%的性能提升,尤其随着数据量的增长,性能提升更加明显.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号