首页> 中文期刊>南京师大学报(自然科学版) >一种新的基于FP_Growth的频繁项目集并行挖掘算法

一种新的基于FP_Growth的频繁项目集并行挖掘算法

     

摘要

频繁项目集挖掘用于发现项目之间的关联规则.为了高效求解面向大数据的频繁项目集,本文提出一种新的基于HP_Growth的频繁项目集并行挖掘算法NPFP_Growth(New Parallel algorithm based on FP_Growth),该算法对频繁模式树的存储结构进行改进,基于Map/Reduce并行计算模型,利用HDFS实现数据存储,在各自计算节点上构造局部频繁模式树,求解该局部频繁模式树中每个分支的最长全局频繁项目集;对于全局非频繁项目集,计算其支持数,发送至相应计算节点进行支持度统计,从而以较为简单的算法实现频繁项目集并行挖掘.实验表明,NPFP_Growth算法具有较高的计算效率和良好的可伸缩性.%Mining of frequent item sets is used to find the association rules between items.In order to get frequent item sets of big data efficiently,this paper proposes a new parallel algorithm for mining frequent item sets based on FP_ Growth,named NPFP_Growth(New Parallel algorithm based on FP_Growth).The storage structure of local frequent pat tern tree is improved and created in each node based on parallel computing model Map/Reduce and distributed storage system HDFS,and then longest global frequent item sets are mined in each branch of the tree.Finally,Support for item sets which does not meet global minimum support is computed and then sent to corresponding computing node to count.Parallel mining algorithm NPFP_Growth is implemented.The experimental results show that the algorithm have high computing efficiency and good scalability.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号