首页> 外文学位 >Mining frequent patterns from sequences: Theory, algorithm, implementation, and performance.

【24h】

Mining frequent patterns from sequences: Theory, algorithm, implementation, and performance.

机译：从序列中挖掘频繁的模式：理论，算法，实现和性能。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Mining frequent patterns from sequences is an important data mining problem which has direct applications in many areas. In this thesis, we make three contributions to the state-of-the-art of the sequential frequent pattern mining.; First of all, we propose a fast pattern-growth mining algorithm using a novel sequence database representation called First-Occurrence Linked WAP-tree (FLWAP-tree). The pattern-growth mining algorithm using the Pre-Order Linked WAP-tree (PLWAP-tree) was reported in the literature to be faster than other algorithms. We show that our pattern-growth using our FLWAP-tree outperforms the PLWAP-tree mining significantly and consistently.; Secondly, we extend the pattern-growth algorithm with partial enumeration so that the frequent patterns can grow with more than one symbol at a time. Our extended pattern-growth algorithm can be regarded as the one that blends both pattern-growth and apriori enumeration mining algorithms in one framework. We show that partial enumeration can speedup the pattern-growth mining when the depth of partial enumeration is properly controlled. Partial enumeration can also reduce the load imbalance among the parallel tasks when the pattern-growth mining algorithm is parallelized to run on parallel computers.; Lastly, we parallelize our pattern-growth mining algorithm using partial enumeration, and show that partial enumeration is essential to lift up the maximum speedup achievable on parallel computers.; In this thesis, we present the theory, algorithm design, implementation, and the experimental results for each of the contributions we make.

机译：从序列中挖掘频繁模式是一个重要的数据挖掘问题，在许多领域都有直接的应用。在本文中，我们对顺序频繁模式挖掘的最新技术做出了三点贡献。首先，我们提出了一种使用新型序列数据库表示形式的快速模式增长挖掘算法，该算法称为首次出现链接WAP树（FLWAP-tree）。文献中报道了使用预链接WAP树（PLWAP树）的模式增长挖掘算法比其他算法更快。我们表明，使用FLWAP树进行模式增长显着且始终如一地优于PLWAP树挖掘。其次，我们通过部分枚举扩展了模式增长算法，以便频繁模式可以一次增长一个以上的符号。我们的扩展模式增长算法可以被视为将模式增长和先验枚举挖掘算法融合在一个框架中的算法。我们表明，当适当控制部分枚举的深度时，部分枚举可以加快模式增长的挖掘速度。当模式增长挖掘算法被并行化以在并行计算机上运行时，部分枚举还可以减少并行任务之间的负载不平衡。最后，我们使用部分枚举并行化了模式增长挖掘算法，并表明部分枚举对于提高并行计算机上可达到的最大加速速度至关重要。在这篇论文中，我们介绍了我们所做的每一个贡献的理论，算法设计，实现和实验结果。

著录项

作者
Turkia, Markus Petteri.;
展开▼
作者单位

University of Arkansas at Little Rock.$bComputer Science.;

展开▼
授予单位 University of Arkansas at Little Rock.$bComputer Science.;
学科 Computer Science.
学位 M.S.
年度 2007
页码 185 p.
总页数 185
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Customized frequent patterns mining algorithms for enhanced Top-Rank-K frequent pattern mining [J] . Abdelaal Areej Ahmad, Abed Saed, Al-Shayeji Mohammad, Expert systems with applications . 2021,第May期

机译：用于增强的顶级速率挖掘的定制频繁模式挖掘算法
2. Using Frequent Substring Mining Techniques for Indexing Genome Sequences: A Comparison of Frequent Substring and Frequent Max Substring Algorithms [J] . Todsanai Chumwatana Journal of Advances in Information Technology . 2016,第4期

机译：使用频繁子串挖掘技术为基因组序列建立索引：频繁子串算法和最大最大子串算法的比较
3. Frequent Pattern Mining Algorithms for Finding Associated Frequent Patterns for Data Streams: A Survey [J] . Shamila Nasreen, Muhammad Awais Azam, Khurram Shehzad, Procedia Computer Science . 2014,第1期

机译：频繁模式挖掘算法，用于查找数据流的关联频繁模式：一项调查
4. An algorithm for mining frequent patterns in biological sequence [C] . Ling Chen, Wei Liu 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences . 2011

机译：一种在生物序列中挖掘频繁模式的算法
5. A top-down approach for mining most specific frequent patterns in biological sequence data. [D] . Zhang, Xiang. 2004

机译：自顶向下的方法，用于挖掘生物序列数据中最特定的频繁模式。
6. SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences [O] . Fathi Elloumi, Martha Nason 2007

机译：SEARCHPATTOOL：一种新的方法用于挖掘最常见的结合位点频繁模式并应用于原核DNA序列
7. Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences [O] . S. Rajasekaran, D Centre 2015

机译：生物数据序列的频繁连续模式挖掘算法

Mining frequent patterns from sequences: Theory, algorithm, implementation, and performance.

摘要

著录项

相似文献

相关主题

期刊订阅