首页> 中文期刊> 《电子技术应用》 >基于Spark的改进关联规则算法研究

基于Spark的改进关联规则算法研究

         

摘要

针对关联规则Apriori算法在信息爆炸时代面对海量数据时,其计算周期大、算法效率低等问题,将数据以特定的数据结构进行存储,降低数据遍历次数;在连接操作前进行剪枝操作,并且改变剪枝操作的判定条件;同时将改进算法IApriori与基于内存的大数据并行计算处理框架Apache Spark相结合,提出了一种基于Spark的Apriori改进算法(Spark+ IAprior).实验结果表明,Spark+ IApriori算法在集群伸缩性和加速比方面都优于Apriori算法.%Association rules Apriori algorithm have problems with large calculation cycle and low algorithm efficiency faced with huge amounts of data in the era of information explosion,data in a specific storage on the data structure to reduce the data on the number of times past,pruning operation before the items self-joins and changing the terms of judgment have been adopted in the paper,and the algorithm combined with Spark computing framework,an improved algorithm based on the Spark(Spark+IApriori) can be put forward.Experimental results show that the Spark+IApriori algorithm has a good data scalability and speed ratio than Apriori.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号