首页> 外文学位 >New techniques for efficiently discovering frequent patterns.
【24h】

New techniques for efficiently discovering frequent patterns.

机译:有效发现频繁模式的新技术。

获取原文
获取原文并翻译 | 示例

摘要

Because of its theoretical and practical importance, the field of frequent pattern mining has been and remain to be one of the most active research area in KDD. In this dissertation, we study three different problems in frequent pattern mining, mining multiple datasets, mining streaming data, and mining large-scale structures from graph datasets. Our study has not only extended the breadth of frequent pattern mining, but also brought new techniques and algorithms into this field. Specifically, our contributions are as follows. (1) Mining multiple datasets. We develop a systematic approach to generate efficient query plans for a single mining query across multiple datasets. We also propose methods to simultaneously optimize multiple such queries and utilize the past mining results in a query-intensive KDD environment. Our experimental results have shown a speedup up to two-order of magnitude comparing with the naive methods without these optimizations. (2) Mining frequent itemsets over streaming data. We propose a new algorithm StreamMining to discover the frequent itemsets over streaming data. In a single pass, StreamMining will guarantee to find a superset of frequent itemsets, but false positive may occur. If the second pass is allowed, StreamMining will be able to remove the false positive and find the exact frequent itemsets. Our detailed evaluation using both synthetic and real datasets has shown our one-pass algorithm is very accurate in practice, and is also very memory efficient. (3) Mining frequent large-scale structures from graph datasets. We develop a new framework to discover the frequent large-scale structures from graph datasets. This framework is derived from a mathematical concept, topological minor. In this framework, we propose a new algorithm TSMiner, which efficiently enumerates all the frequent large-scale structures in a graph dataset, and a new approach called relabeling function to perform constraint mining. We apply our framework to protein structure data and discover meaningful topological structures. Finally, we demonstrate the viability and scalability of the proposed algorithms on both real and synthetic datasets.
机译:由于其理论和实践重要性,频繁模式挖掘领域一直是并且仍将是KDD中最活跃的研究领域之一。在本文中,我们研究了频繁模式挖掘,挖掘多个数据集,挖掘流数据以及从图数据集挖掘大规模结构中的三个不同问题。我们的研究不仅扩展了频繁模式挖掘的广度,而且还为该领域带来了新的技术和算法。具体来说,我们的贡献如下。 (1)挖掘多个数据集。我们开发了一种系统化的方法来为跨多个数据集的单个挖掘查询生成有效的查询计划。我们还提出了在多个查询密集型KDD环境中同时优化多个此类查询并利用过去的挖掘结果的方法。我们的实验结果表明,与未进行这些优化的朴素方法相比,其提速高达两个数量级。 (2)通过流数据挖掘频繁项集。我们提出了一种新的算法StreamMining来发现流数据上的频繁项集。在一次通过中,StreamMining将保证找到频繁项集的超集,但可能会出现误报。如果允许第二次通过,StreamMining将能够删除误报并找到确切的频繁项目集。我们使用综合数据集和实际数据集进行的详细评估显示,我们的一遍算法在实践中非常准确,并且存储效率也很高。 (3)从图数据集中挖掘频繁的大型结构。我们开发了一个新的框架来从图数据集中发现频繁的大规模结构。该框架源自数学概念拓扑次要。在此框架中,我们提出了一种新算法TSMiner,该算法可有效枚举图数据集中所有频繁出现的大规模结构,以及一种称为重新标记函数的新方法来执行约束挖掘。我们将我们的框架应用于蛋白质结构数据并发现有意义的拓扑结构。最后,我们在真实和合成数据集上展示了所提出算法的可行性和可扩展性。

著录项

  • 作者

    Jin, Ruoming.;

  • 作者单位

    The Ohio State University.;

  • 授予单位 The Ohio State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 188 p.
  • 总页数 188
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号