首页> 外文期刊>Expert Systems with Application >FPO tree and DP3 algorithm for distributed parallel Frequent Itemsets Mining
【24h】

FPO tree and DP3 algorithm for distributed parallel Frequent Itemsets Mining

机译:分布式并行频繁项目集挖掘的FPO树和DP3算法

获取原文
获取原文并翻译 | 示例
           

摘要

Frequent Itemsets Mining is a fundamental mining model in Data Mining. It supports a vast range of application fields and can be employed as a key calculation phase in many other mining models such as Association Rules, Correlations, Classifications, etc. Many distributed parallel algorithms have been introduced to confront with very large-scale datasets of Big Data. However, the problems of running time and memory scalability still have not had adequate solutions for very large and "hard-to-mined" datasets. In this paper, we propose a distributed parallel algorithm named DP3 (Distributed PrePostPlus) which parallelizes the state-of-the-art algorithm PrePost(+) and operates in Master-Slaves model. Slave machines mine and send local frequent itemsets and support counts to the Master for aggregations. In the case of tremendous numbers of itemsets transferred between the Slaves and Master, the computational load at the Master, therefore, is extremely heavy if there is not the support from our complete FPO tree (Frequent Patterns Organization) which can provide optimal compactness for light data transfers and highly efficient aggregations with pruning ability. Processing phases of the Slaves and Master are designed for memory scalability and shared-memory parallel in Work-Pool model so as to utilize the computational power of multi-core CPUs. We conducted experiments on both synthetic and real datasets, and the empirical results have shown that our algorithm far outperforms the well-known PFP and other three recently high-performance ones Dist-Eclat, BigFIM, and MapFIM. (C) 2019 Elsevier Ltd. All rights reserved.
机译:频繁项集挖掘是数据挖掘中的基本挖掘模型。它支持广泛的应用领域,并且可以用作关联规则,关联性,分类等许多其他挖掘模型的关键计算阶段。引入了许多分布式并行算法来应对Big的超大规模数据集数据。但是,运行时间和内存可伸缩性问题仍然没有针对非常庞大且“难以挖掘”的数据集提供适当的解决方案。在本文中,我们提出了一种名为DP3(分布式PrePostPlus)的分布式并行算法,该算法并行化了最新算法PrePost(+)并在Master-Slaves模型中运行。从机挖掘并发送本地频繁项集和支持计数到主服务器进行聚合。如果在从属服务器和主服务器之间传输的项目集数量巨大,那么如果没有我们完整的FPO树(频繁模式组织)的支持,主服务器的计算负担将非常沉重,FPO树可以为光提供最佳的紧凑性数据传输和具有修剪功能的高效聚合。从站和主站的处理阶段旨在实现工作池模型中的内存可伸缩性和共享内存并行,从而利用多核CPU的计算能力。我们在合成数据集和真实数据集上进行了实验,经验结果表明,我们的算法远远胜过著名的PFP和其他三个最近的高性能Dist-Eclat,BigFIM和MapFIM。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号