首页> 外文学位 >Mining frequent patterns from sequences: Theory, algorithm, implementation, and performance.
【24h】

Mining frequent patterns from sequences: Theory, algorithm, implementation, and performance.

机译:从序列中挖掘频繁的模式:理论,算法,实现和性能。

获取原文
获取原文并翻译 | 示例

摘要

Mining frequent patterns from sequences is an important data mining problem which has direct applications in many areas. In this thesis, we make three contributions to the state-of-the-art of the sequential frequent pattern mining.; First of all, we propose a fast pattern-growth mining algorithm using a novel sequence database representation called First-Occurrence Linked WAP-tree (FLWAP-tree). The pattern-growth mining algorithm using the Pre-Order Linked WAP-tree (PLWAP-tree) was reported in the literature to be faster than other algorithms. We show that our pattern-growth using our FLWAP-tree outperforms the PLWAP-tree mining significantly and consistently.; Secondly, we extend the pattern-growth algorithm with partial enumeration so that the frequent patterns can grow with more than one symbol at a time. Our extended pattern-growth algorithm can be regarded as the one that blends both pattern-growth and apriori enumeration mining algorithms in one framework. We show that partial enumeration can speedup the pattern-growth mining when the depth of partial enumeration is properly controlled. Partial enumeration can also reduce the load imbalance among the parallel tasks when the pattern-growth mining algorithm is parallelized to run on parallel computers.; Lastly, we parallelize our pattern-growth mining algorithm using partial enumeration, and show that partial enumeration is essential to lift up the maximum speedup achievable on parallel computers.; In this thesis, we present the theory, algorithm design, implementation, and the experimental results for each of the contributions we make.
机译:从序列中挖掘频繁模式是一个重要的数据挖掘问题,在许多领域都有直接的应用。在本文中,我们对顺序频繁模式挖掘的最新技术做出了三点贡献。首先,我们提出了一种使用新型序列数据库表示形式的快速模式增长挖掘算法,该算法称为首次出现链接WAP树(FLWAP-tree)。文献中报道了使用预链接WAP树(PLWAP树)的模式增长挖掘算法比其他算法更快。我们表明,使用FLWAP树进行模式增长显着且始终如一地优于PLWAP树挖掘。其次,我们通过部分枚举扩展了模式增长算法,以便频繁模式可以一次增长一个以上的符号。我们的扩展模式增长算法可以被视为将模式增长和先验枚举挖掘算法融合在一个框架中的算法。我们表明,当适当控制部分枚举的深度时,部分枚举可以加快模式增长的挖掘速度。当模式增长挖掘算法被并行化以在并行计算机上运行时,部分枚举还可以减少并行任务之间的负载不平衡。最后,我们使用部分枚举并行化了模式增长挖掘算法,并表明部分枚举对于提高并行计算机上可达到的最大加速速度至关重要。在这篇论文中,我们介绍了我们所做的每一个贡献的理论,算法设计,实现和实验结果。

著录项

  • 作者

    Turkia, Markus Petteri.;

  • 作者单位

    University of Arkansas at Little Rock.$bComputer Science.;

  • 授予单位 University of Arkansas at Little Rock.$bComputer Science.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2007
  • 页码 185 p.
  • 总页数 185
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号