...
首页> 外文期刊>IEEE Transactions on Systems, Man, and Cybernetics >FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
【24h】

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

机译:FiDoop:使用MapReduce并行挖掘频繁项集

获取原文
获取原文并翻译 | 示例
           

摘要

Existing parallel mining algorithms for frequent itemsets lack a mechanism that enables automatic parallelization, load balancing, data distribution, and fault tolerance on large clusters. As a solution to this problem, we design a parallel frequent itemsets mining algorithm called FiDoop using the MapReduce programming model. To achieve compressed storage and avoid building conditional pattern bases, FiDoop incorporates the frequent items ultrametric tree, rather than conventional FP trees. In FiDoop, three MapReduce jobs are implemented to complete the mining task. In the crucial third MapReduce job, the mappers independently decompose itemsets, the reducers perform combination operations by constructing small ultrametric trees, and the actual mining of these trees separately. We implement FiDoop on our in-house Hadoop cluster. We show that FiDoop on the cluster is sensitive to data distribution and dimensions, because itemsets with different lengths have different decomposition and construction costs. To improve FiDoop’s performance, we develop a workload balance metric to measure load balance across the cluster’s computing nodes. We develop FiDoop-HD, an extension of FiDoop, to speed up the mining performance for high-dimensional data analysis. Extensive experiments using real-world celestial spectral data demonstrate that our proposed solution is efficient and scalable.
机译:现有的用于频繁项集的并行挖掘算法缺少一种机制,该机制可以在大型集群上实现自动并行化,负载平衡,数据分发和容错。为解决此问题,我们使用MapReduce编程模型设计了一种并行的频繁项集挖掘算法,称为FiDoop。为了实现压缩存储并避免建立条件模式库,FiDoop合并了频繁项超度量树,而不是传统的FP树。在FiDoop中,实现了三个MapReduce作业以完成挖掘任务。在至关重要的第三项MapReduce作业中,制图员独立地分解项目集,还原器通过构造小型超度量树来执行组合操作,并分别对这些树进行实际挖掘。我们在内部Hadoop集群上实施FiDoop。我们表明,集群上的FiDoop对数据分布和维度敏感,因为具有不同长度的项目集具有不同的分解和构造成本。为了提高FiDoop的性能,我们开发了工作负载平衡指标,以衡量整个集群计算节点之间的负载平衡。我们开发了FiDoop的扩展产品FiDoop-HD,以加快高维数据分析的挖掘性能。使用现实世界的天体光谱数据进行的大量实验证明,我们提出的解决方案高效且可扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号