Frequent sets by leap-traversal for sequential and parallel paradigms.

机译：通过跳跃遍历的频次集可用于顺序和并行范例。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Efficient discovery of frequent patterns from large databases is an active research area in data mining with broad applications in industry and deep implications in many areas of data mining. Although many efficient frequent-pattern mining techniques have been developed in the last decade, most of them assume relatively small databases, leaving extremely large but realistic datasets out of reach. When computationally feasible, mining extremely large databases produces tremendously large numbers of frequent patterns. Mining for frequent itemsets can generate an overwhelming number of patterns, often exceeding the size of the original transactional database. In many cases, it is impractical to mine those datasets due to their sheer size; not only because of the extent of the existing patterns, but mainly the magnitude of the search space.; In this research we propose a new traversal approach that jumps in the search space among only promising nodes. Our leaping approach avoids nodes that would not participate in the answer set and drastically reduces the number of candidate patterns. We use this approach to efficiently pinpoint maximal patterns at the border of the frequent patterns in the lattice and collect enough information in the process to generate all subsequent patterns.; Using this approach we did mine sequentially sizes never been reported. We also generated different types of patterns and push constraints efficiently to filter the answer set to only patterns that are of interest to the decision makers.; To open the doors to the mining of extremely large databases, parallelizing the search for frequent patterns plays an important role. Not all good sequential algorithms can be effectively parallelized and parallelization alone is not enough. An algorithm has to be well suited for parallelization, and in the case of frequent pattern mining, clever methods for searching are certainly an advantage. The algorithm we propose for parallel mining of frequent maximal patterns, is based on our new technique for astutely jumping within the search space, and more importantly, is composed of autonomous task segments that can be performed separately and thus minimize communication between processors. Our parallel algorithm for mining frequent patterns generates all types of patterns and supports constraints pushing. Using this approach allows the mining, in a reasonable time, of databases in the order of billion transactions using relatively inexpensive clusters.

机译：从大型数据库中高效发现频繁模式是数据挖掘中一个活跃的研究领域，在工业中具有广泛的应用，并且在许多数据挖掘领域中都具有深远的意义。尽管在过去的十年中开发了许多有效的频繁模式挖掘技术，但大多数技术都假设使用相对较小的数据库，从而使庞大而现实的数据集无法获得。当计算上可行时，挖掘极大的数据库会产生大量的频繁模式。频繁项目集的挖掘会产生大量模式，通常超过原始事务数据库的大小。在很多情况下，由于数据集的绝对大小，要挖掘它们是不切实际的。不仅由于现有模式的范围，而且主要是由于搜索空间的大小。在这项研究中，我们提出了一种新的遍历方法，该方法仅在有希望的节点之间在搜索空间中跳跃。我们的跨越式方法避免了不参与答案集的节点，并大大减少了候选模式的数量。我们使用这种方法来有效地在格子中频繁模式的边界处精确定位最大模式，并在过程中收集足够的信息以生成所有后续模式。使用这种方法，我们确实没有按顺序挖掘尺寸。我们还生成了不同类型的模式，并有效地推动了约束条件，以将答案集过滤为仅决策者感兴趣的模式。为了为挖掘大型数据库打开大门，并行搜索频繁模式起着重要作用。并非所有好的顺序算法都可以有效地并行化，仅并行化还不够。一种算法必须非常适合并行化，并且在频繁进行模式挖掘的情况下，聪明的搜索方法无疑是一个优势。我们提出的用于频繁挖掘最大模式的并行挖掘的算法基于我们的新技术，可以在搜索空间内快速跳转，更重要的是，该算法由可以单独执行的自治任务段组成，从而最大程度地减少了处理器之间的通信。我们的用于挖掘频繁模式的并行算法可生成所有类型的模式，并支持约束推送。使用这种方法可以使用相对便宜的集群在合理的时间内挖掘数十亿次事务的数据库。

著录项

作者
El-Hajj, Mohammad Omar.;
展开▼
作者单位

University of Alberta (Canada).;

展开▼
授予单位 University of Alberta (Canada).;
学科 Computer Science.
学位 Ph.D.
年度 2006
页码 125 p.
总页数 125
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-17 11:39:47

相似文献

外文文献
中文文献
专利

1. Efficient Adaptive Frequent Pattern Mining Techniques for Market Analysis in Sequential and Parallel Systems [J] . Kuriakose Sherly, Nedunchezhian Raju The international arab journal of information technology . 2017,第2期

机译：顺序和并行系统中用于市场分析的高效自适应频繁模式挖掘技术
2. Frequent item set mining for sequential data: Synchrony in neuronal spike trains [J] . David Picado Muino, Christian Borgelt Intelligent data analysis . 2014,第6期

机译：频繁项集挖掘以获取顺序数据：神经元峰值序列中的同步
3. High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework [J] . Guru Prasad M S, Nagesh H R, Swathi Prabhu International Journal of Intelligent Systems and Applications . 2017,第1期

机译：大数据的高性能计算：基于Hadoop MapReduce框架的事务数据并行频繁项集挖掘算法的性能优化方法
4. Parallelized Frequent Item Set Mining Using a Tall and Skinny Matrix [C] . D. Pooja Janakiram IEEE International Conference on Data Mining Workshops . 2016

机译：使用高且瘦的矩阵并行进行频繁项集挖掘
5. Answer Set Programming and Other Computing Paradigms. [D] . Meng, Yunsong. 2013

机译：答案集编程和其他计算范例。
6. Sequential organogenesis sets two parallel sensory lines in medaka [O] . Ali Seleit, Isabel Krämer, Elizabeth Ambrosio, -1

机译：顺序器官发生在中设置两条平行的感觉线
7. An Efficient Parallel Method for Mining Frequent Closed Sequential Patterns [O] . Bao Huynh, Bay Vo, Vaclav Snasel 2017

机译：用于采矿频繁关闭顺序图案的有效并行方法
8. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms. [R] . Lawson, G., Poteat, M., Sosonkina, M., 2016

机译：使用mini-apps提高应用程序性能：混合并行编程范例的比较。

Frequent sets by leap-traversal for sequential and parallel paradigms.

摘要

著录项

相似文献

相关主题

期刊订阅