New techniques for efficiently discovering frequent patterns.

机译：有效发现频繁模式的新技术。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Because of its theoretical and practical importance, the field of frequent pattern mining has been and remain to be one of the most active research area in KDD. In this dissertation, we study three different problems in frequent pattern mining, mining multiple datasets, mining streaming data, and mining large-scale structures from graph datasets. Our study has not only extended the breadth of frequent pattern mining, but also brought new techniques and algorithms into this field. Specifically, our contributions are as follows. (1) Mining multiple datasets. We develop a systematic approach to generate efficient query plans for a single mining query across multiple datasets. We also propose methods to simultaneously optimize multiple such queries and utilize the past mining results in a query-intensive KDD environment. Our experimental results have shown a speedup up to two-order of magnitude comparing with the naive methods without these optimizations. (2) Mining frequent itemsets over streaming data. We propose a new algorithm StreamMining to discover the frequent itemsets over streaming data. In a single pass, StreamMining will guarantee to find a superset of frequent itemsets, but false positive may occur. If the second pass is allowed, StreamMining will be able to remove the false positive and find the exact frequent itemsets. Our detailed evaluation using both synthetic and real datasets has shown our one-pass algorithm is very accurate in practice, and is also very memory efficient. (3) Mining frequent large-scale structures from graph datasets. We develop a new framework to discover the frequent large-scale structures from graph datasets. This framework is derived from a mathematical concept, topological minor. In this framework, we propose a new algorithm TSMiner, which efficiently enumerates all the frequent large-scale structures in a graph dataset, and a new approach called relabeling function to perform constraint mining. We apply our framework to protein structure data and discover meaningful topological structures. Finally, we demonstrate the viability and scalability of the proposed algorithms on both real and synthetic datasets.

机译：由于其理论和实践重要性，频繁模式挖掘领域一直是并且仍将是KDD中最活跃的研究领域之一。在本文中，我们研究了频繁模式挖掘，挖掘多个数据集，挖掘流数据以及从图数据集挖掘大规模结构中的三个不同问题。我们的研究不仅扩展了频繁模式挖掘的广度，而且还为该领域带来了新的技术和算法。具体来说，我们的贡献如下。（1）挖掘多个数据集。我们开发了一种系统化的方法来为跨多个数据集的单个挖掘查询生成有效的查询计划。我们还提出了在多个查询密集型KDD环境中同时优化多个此类查询并利用过去的挖掘结果的方法。我们的实验结果表明，与未进行这些优化的朴素方法相比，其提速高达两个数量级。（2）通过流数据挖掘频繁项集。我们提出了一种新的算法StreamMining来发现流数据上的频繁项集。在一次通过中，StreamMining将保证找到频繁项集的超集，但可能会出现误报。如果允许第二次通过，StreamMining将能够删除误报并找到确切的频繁项目集。我们使用综合数据集和实际数据集进行的详细评估显示，我们的一遍算法在实践中非常准确，并且存储效率也很高。（3）从图数据集中挖掘频繁的大型结构。我们开发了一个新的框架来从图数据集中发现频繁的大规模结构。该框架源自数学概念拓扑次要。在此框架中，我们提出了一种新算法TSMiner，该算法可有效枚举图数据集中所有频繁出现的大规模结构，以及一种称为重新标记函数的新方法来执行约束挖掘。我们将我们的框架应用于蛋白质结构数据并发现有意义的拓扑结构。最后，我们在真实和合成数据集上展示了所提出算法的可行性和可扩展性。

著录项

作者
Jin, Ruoming.;
展开▼
作者单位

The Ohio State University.;

展开▼
授予单位 The Ohio State University.;
学科 Computer Science.
学位 Ph.D.
年度 2005
页码 188 p.
总页数 188
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Efficiently Updating the Discovered Multiple Fuzzy Frequent Itemsets with Transaction Insertion [J] . Jerry Chun-Wei Lin, Yuyu Zhang, Philippe Fournier-Viger, International Journal of Fuzzy Systems . 2018,第8期

机译：通过事务插入有效地更新发现的多个模糊频繁项集
2. An efficient algorithm to maintain the discovered frequent sequences with record deletion [J] . Lin Jerry Chun-Wei, Gan Wensheng, Hong Tzung-Pei, Intelligent data analysis . 2016,第3期

机译：一种有效的算法，通过记录删除来维护发现的频繁序列
3. A fast and resource efficient mining algorithm for discovering frequent patterns in distributed computing environments [J] . Kawuu W. Lin, Sheng-Hao Chung Future generation computer systems . 2015,第nova期

机译：一种快速且资源有效的挖掘算法，用于发现分布式计算环境中的频繁模式
4. An improved and efficient frequent pattern mining approach to discover frequent patterns among important attributes in large data set using IA-TJ-FGTT [C] . Saravanan Suba, T. Christopher 2016 IEEE International Conference on Advances in Computer Applications . 2016

机译：一种改进且高效的频繁模式挖掘方法，使用IA-TJ-FGTT在大数据集中的重要属性中发现频繁模式
5. Discovering and mining user Web-page traversal patterns. [D] . Mortazavi-Asl, Behzad. 2001

机译：发现和挖掘用户网页遍历模式。
6. Techniques of low technology sampling of air pollution by metals: a comparison of concentrations and map patterns. [O] . O L Lloyd, F A Gailey 1987

机译：金属空气污染的低技术采样技术：浓度和图谱的比较。
7. AN IMPROVED AND EFFICIENT METHOD TO DISCOVER THE FREQUENT PATTERNS FROM TARGETED PATTERNS IN TRANSACTIONAL DATASET USING TPIITR-FPMM [O] . Saravanan. Suba 2017

机译：使用TPIITR-FPMM发现事务数据集中的目标模式的频繁模式的改进和有效的方法
8. Efficient Algorithm for Discovering Frequent Subgraphs. [R] . Kuramochi, M., Karypis, G. 2002

机译：发现频繁子图的有效算法。

New techniques for efficiently discovering frequent patterns.

摘要

著录项

相似文献

相关主题

期刊订阅