首页> 外文期刊>International Journal of High Performance Computing and Networking >A fast and parallel algorithm for frequent pattern mining from big data in many-task environments
【24h】

A fast and parallel algorithm for frequent pattern mining from big data in many-task environments

机译:许多任务环境中大数据频繁模式挖掘的快速和并行算法

获取原文
获取原文并翻译 | 示例
           

摘要

Many studies have tried to efficiently discover frequent patterns in large databases. The algorithms used in these studies fall into two main categories: apriori algorithms and frequent pattern growth (FP-growth) algorithms. Apriori algorithms operate according to a generate-and-test approach, so performance suffers from the testing of too many candidate itemsets. Therefore, most recent studies have applied an FP-growth approach to the discovery of frequent patterns. The rapid growth of data, however, has introduced new challenges for the mining of frequent patterns, in terms of both execution efficiency and scalability. Big data often contains a large number of items, a large number of transactions and long average transaction length, which result in large FP-trees. In addition to its dependence on data characteristics, FP-tree size is also sensitive to the minimum support threshold. This is because the small support is probable to bring many branches for nodes, greatly enlarging the FP-tree and the number of reconstructed conditional pattern-based trees. In this paper, we propose a novel algorithm and architecture for efficiently mining frequent patterns from big data in distributed many-task computing environments. Through empirical evaluation of various simulation conditions, we show that the proposed method delivers excellent execution time.
机译:许多研究试图有效地发现大型数据库中的频繁模式。这些研究中使用的算法分为两个主要类别:APRIORI算法和频繁的模式生长(FP-生长)算法。 APRIORI算法根据生成和测试方法操作,因此性能遭受了对太多候选项目集的测试。因此,最近的研究已经应用了FP-生长方法来发现频繁模式。然而,就执行效率和可扩展性而言,数据的快速增长引入了频繁模式的挖掘新挑战。大数据通常包含大量物品,大量交易和长的平均交易长度,导致大型FP树。除了依赖于数据特性之外,FP-Tree大小也对最小支持阈值敏感。这是因为小的支持可能是为节点带来许多分支,大大放大FP树和重建的条件模式的树木的数量。在本文中,我们提出了一种新颖的算法和架构,用于有效地从分布式许多任务计算环境中的大数据中频繁挖掘频繁模式。通过对各种仿真条件的经验评估,我们表明该方法提供了优异的执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号