A fast and parallel algorithm for frequent pattern mining from big data in many-task environments

Wei-Tee Lin; Chih-Ping Chu

首页> 外文期刊>International Journal of High Performance Computing and Networking >A fast and parallel algorithm for frequent pattern mining from big data in many-task environments

【24h】

A fast and parallel algorithm for frequent pattern mining from big data in many-task environments

机译：许多任务环境中大数据频繁模式挖掘的快速和并行算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many studies have tried to efficiently discover frequent patterns in large databases. The algorithms used in these studies fall into two main categories: apriori algorithms and frequent pattern growth (FP-growth) algorithms. Apriori algorithms operate according to a generate-and-test approach, so performance suffers from the testing of too many candidate itemsets. Therefore, most recent studies have applied an FP-growth approach to the discovery of frequent patterns. The rapid growth of data, however, has introduced new challenges for the mining of frequent patterns, in terms of both execution efficiency and scalability. Big data often contains a large number of items, a large number of transactions and long average transaction length, which result in large FP-trees. In addition to its dependence on data characteristics, FP-tree size is also sensitive to the minimum support threshold. This is because the small support is probable to bring many branches for nodes, greatly enlarging the FP-tree and the number of reconstructed conditional pattern-based trees. In this paper, we propose a novel algorithm and architecture for efficiently mining frequent patterns from big data in distributed many-task computing environments. Through empirical evaluation of various simulation conditions, we show that the proposed method delivers excellent execution time.

机译：许多研究试图有效地发现大型数据库中的频繁模式。这些研究中使用的算法分为两个主要类别：APRIORI算法和频繁的模式生长（FP-生长）算法。 APRIORI算法根据生成和测试方法操作，因此性能遭受了对太多候选项目集的测试。因此，最近的研究已经应用了FP-生长方法来发现频繁模式。然而，就执行效率和可扩展性而言，数据的快速增长引入了频繁模式的挖掘新挑战。大数据通常包含大量物品，大量交易和长的平均交易长度，导致大型FP树。除了依赖于数据特性之外，FP-Tree大小也对最小支持阈值敏感。这是因为小的支持可能是为节点带来许多分支，大大放大FP树和重建的条件模式的树木的数量。在本文中，我们提出了一种新颖的算法和架构，用于有效地从分布式许多任务计算环境中的大数据中频繁挖掘频繁模式。通过对各种仿真条件的经验评估，我们表明该方法提供了优异的执行时间。

著录项

来源
《International Journal of High Performance Computing and Networking》 |2017年第3期|共11页
作者
Wei-Tee Lin; Chih-Ping Chu;
展开▼
作者单位

Department of Computer Science and Information Engineering National Cheng Kung University;

Department of Computer Science and Information Engineering National Cheng Kung University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Data mining; Big data; Many-task computing; Frequent patterns mining;

机译：数据挖掘;大数据;许多任务计算;频繁的模式挖掘;

相似文献

外文文献
中文文献
专利

1. A fast and parallel algorithm for frequent pattern mining from big data in many-task environments [J] . Wei-Tee Lin, Chih-Ping Chu International Journal of High Performance Computing and Networking . 2017,第3期

机译：许多任务环境中大数据频繁模式挖掘的快速和并行算法
2. Efficient algorithms for frequent pattern mining in many-task computing environments [J] . Kawuu W. Lin, Yu-Chin Lo Knowledge-Based Systems . 2013,第sepa期

机译：在多任务计算环境中进行频繁模式挖掘的高效算法
3. A fast and resource efficient mining algorithm for discovering frequent patterns in distributed computing environments [J] . Kawuu W. Lin, Sheng-Hao Chung Future generation computer systems . 2015,第nova期

机译：一种快速且资源有效的挖掘算法，用于发现分布式计算环境中的频繁模式
4. Efficient strategies for many-task frequent pattern mining in cloud computing environments [C] . Lin Kawuu W., Luo Yu-Chin 2010 IEEE International Conference on Systems Man and Cybernetics . 2010

机译：云计算环境中多任务频繁模式挖掘的高效策略
5. New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases. [D] . Peterson, Erich Allen. 2012

机译：在某些不确定数据库中频繁进行顺序模式和项集数据挖掘的新算法。
6. An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks [O] . Jieyue He, Chunyan Wang, Kunpu Qiu, 2014

机译：不确定生物网络中基于电路仿真的频繁概率模式挖掘新算法
7. A Distributed Algorithm for Fast Mining Frequent Patterns in Limited and Varying Network Bandwidth Environments [O] . Chun-Cheng Lin, Wei-Ching Li, Ju-Chin Chen, 2019

机译：一种分布式算法，用于快速挖掘有限和不同网络带宽环境中的频繁模式

A fast and parallel algorithm for frequent pattern mining from big data in many-task environments

摘要

著录项

相似文献

相关主题

期刊订阅