基于MapReduce的频繁模式挖掘算法的优化

王波; 王怀彬; 张超

首页> 中文期刊> 《天津理工大学学报》 >基于MapReduce的频繁模式挖掘算法的优化

基于MapReduce的频繁模式挖掘算法的优化

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

分布式数据挖掘计算是大数据研究中非常重要的技术,现有的对频繁模式的分布式挖掘方法在处理大量数据集时仍然存在许多局限,如并行Apriori算法在多次扫描数据库过程中对I/O产生很大负担,并且有大量候选集产生.本文使用的FP-growth算法包括Fp-tree构建和频繁模式挖掘两个阶段.主要思想是在map阶段构建FP-tree之前,根据步长值及项目元素编码对FP-tree节点合并,并在shuffle阶段依据平衡算法划分给不同的reducer.平衡算法用来均衡工作负载.利用该算法来降低数据分配的随机性,避免数据挖掘阶段由于数据划分不均衡导致部分reducer开销过大的缺点.实验结果表明:与现有方法相比,在较大数据集情况下改进后的算法具有更好地运算效率和可伸缩性.%Distributed data mining calculation is critical in the study of big data technology.For the existing frequent pattern mining method,there are still many limitations in dealing with large data sets,such as parallel Apriori algorithm,which has a great burden on I/O in the process of frequently scanning database,and there are a large number of candidate sets.This paper proposes FP-growth algorithm with FP-tree construction and mining frequent patterns in two stages.The main idea is to merge the node of FP-tree according to the step value and item elements encoding before map stage,and in shuffle stage the encoding items are distributed to different reducer according to the balance algorithm.The balance algorithm is used to balance varied workload.The algorithm is used to reduce the randomness of data distribution and avoid the disadvantages of unbalanced data classification in certain reducer causing too much overhead.The experimental results show that compared with the existing methods,in the case of large data sets the improved algorithm has better computation efficiency and scalability.

著录项

来源
《天津理工大学学报》 |2018年第1期|6-11|共6页
作者
王波; 王怀彬; 张超;
展开▼
作者单位

天津理工大学计算机科学与工程学院,天津300384;

天津理工大学计算机科学与工程学院,天津300384;

天津理工大学计算机科学与工程学院,天津300384;

展开▼
原文格式 PDF
正文语种 chi
中图分类算法理论;
关键词
MapReduce; 频繁模式挖掘; FP-growth算法; 平衡算法;

相似文献

中文文献
外文文献
专利

1. 一种基于MapReduce的频繁模式挖掘算法 [J] . 叶海琴 ,孟彩霞 ,王意锋 . 南京理工大学学报（自然科学版） . 2018,第001期
2. PFPonCanTree:一种基于MapReduce的并行频繁模式增量挖掘算法 [J] . 肖文 ,胡娟 ,周晓峰 . 计算机工程与科学 . 2018,第001期
3. 一种基于频繁模式有向无环图的数据流频繁模式挖掘算法 [J] . 任家东 ,王倩 ,王蒙 . 燕山大学学报 . 2011,第002期
4. 基于频繁模式树的最大频繁模式挖掘算法 [J] . 缪裕青 . 桂林电子科技大学学报 . 2004,第003期
5. 基于解耦概要图的图数据频繁模式挖掘算法 [J] . 李洁 . 内蒙古民族大学学报（自然科学版） . 2021,第005期
6. 一种基于中医方剂数据库的Top-Rank-k频繁模式挖掘算法 [C] . . 第33届中国数据库学术会议（NDBC2016 ） . 2016
7. 基于MapReduce的频繁模式挖掘算法并行化及负载均衡的研究 [A] . 晏依 . 2019

基于MapReduce的频繁模式挖掘算法的优化

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅