FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

Xun Yaling; Zhang Jifu; Qin Xiao

首页> 外文期刊>IEEE Transactions on Systems, Man, and Cybernetics >FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

【24h】

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

机译：FiDoop：使用MapReduce并行挖掘频繁项集

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Existing parallel mining algorithms for frequent itemsets lack a mechanism that enables automatic parallelization, load balancing, data distribution, and fault tolerance on large clusters. As a solution to this problem, we design a parallel frequent itemsets mining algorithm called FiDoop using the MapReduce programming model. To achieve compressed storage and avoid building conditional pattern bases, FiDoop incorporates the frequent items ultrametric tree, rather than conventional FP trees. In FiDoop, three MapReduce jobs are implemented to complete the mining task. In the crucial third MapReduce job, the mappers independently decompose itemsets, the reducers perform combination operations by constructing small ultrametric trees, and the actual mining of these trees separately. We implement FiDoop on our in-house Hadoop cluster. We show that FiDoop on the cluster is sensitive to data distribution and dimensions, because itemsets with different lengths have different decomposition and construction costs. To improve FiDoop’s performance, we develop a workload balance metric to measure load balance across the cluster’s computing nodes. We develop FiDoop-HD, an extension of FiDoop, to speed up the mining performance for high-dimensional data analysis. Extensive experiments using real-world celestial spectral data demonstrate that our proposed solution is efficient and scalable.

机译：现有的用于频繁项集的并行挖掘算法缺少一种机制，该机制可以在大型集群上实现自动并行化，负载平衡，数据分发和容错。为解决此问题，我们使用MapReduce编程模型设计了一种并行的频繁项集挖掘算法，称为FiDoop。为了实现压缩存储并避免建立条件模式库，FiDoop合并了频繁项超度量树，而不是传统的FP树。在FiDoop中，实现了三个MapReduce作业以完成挖掘任务。在至关重要的第三项MapReduce作业中，制图员独立地分解项目集，还原器通过构造小型超度量树来执行组合操作，并分别对这些树进行实际挖掘。我们在内部Hadoop集群上实施FiDoop。我们表明，集群上的FiDoop对数据分布和维度敏感，因为具有不同长度的项目集具有不同的分解和构造成本。为了提高FiDoop的性能，我们开发了工作负载平衡指标，以衡量整个集群计算节点之间的负载平衡。我们开发了FiDoop的扩展产品FiDoop-HD，以加快高维数据分析的挖掘性能。使用现实世界的天体光谱数据进行的大量实验证明，我们提出的解决方案高效且可扩展。

著录项

来源
《IEEE Transactions on Systems, Man, and Cybernetics》 |2016年第3期|313-325|共13页
作者
Xun Yaling; Zhang Jifu; Qin Xiao;
展开▼
作者单位

Taiyuan University of Science and Technology, Taiyuan, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Frequent itemsets; Hadoop cluster; MapReduce; frequent items ultrametric tree (FIU-tree); load balance;

机译：频繁项集;Hadoop集群;MapReduce;频繁项超度量树（FIU-tree）;负载均衡;

相似文献

外文文献
中文文献
专利

1. Paradigm and performance analysis of distributed frequent itemset mining algorithms based on Mapreduce [J] . Xiao Wen, Hu Juan Microprocessors and microsystems . 2021,第Apra期

机译：基于MapReduce的分布式频繁项目集矿业算法的范例与性能分析
2. The MapReduce Model on Cascading Platform for Frequent Itemset Mining [J] . Nur Rokhman, Amelia Nursanti Indonesian Journal of Computing and Cybernetics Systems . 2018,第2期

机译：级联频繁项集挖掘平台上的MapReduce模型
3. A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework [J] . Sumalatha Saleti, R. B. V. Subramanyam Cluster computing . 2018,第2期

机译：使用MapReduce框架从大型数据集中挖掘频繁项目集的新型比特矢量产品算法
4. MapReduce-based Parallelized Approximation of Frequent Itemsets Mining in Uncertain Data [C] . Jing Xu, Xiao-Jiao Mao, Wen-Yang Lu, International conference on neural information processing . 2015

机译：不确定数据中基于频繁项集挖掘的MapReduce并行近似
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. Unravelling associations between unassigned mass spectrometry peaks with frequent itemset mining techniques [O] . Trung Nghia Vu, Aida Mrzic, Dirk Valkenborg, 2014

机译：利用频繁项集挖掘技术揭示未分配质谱峰之间的关联
7. Frequent Itemset Mining Based on Development of FP-growth Algorithm and Use MapReduce Technique [O] . Zakria Mahrousa, Dima Mufti Alchawafa, Hasan Kazzaz 2021

机译：基于FP-Grangic算法的开发的频繁项目开采，并使用MapReduce技术

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅