首页> 外国专利> Compressed prefix trees and estDec+ method for finding frequent itemsets over data streams

Compressed prefix trees and estDec+ method for finding frequent itemsets over data streams

机译：压缩前缀树和estDec +方法，用于查找数据流中的频繁项集

页面导航

摘要
著录项
相似文献

摘要

Disclosed a relates to a method for finding specific information by analyzing a large amount of data set and a method for finding frequent itemsets in a data mining system realized using the same. The mining in a data stream defined as an indefinite set of data continuously generated is directed to a method for finding valuable knowledge effectively from such data and, recently, various mining methods have been proposed. When considering the characteristics of the data stream in which data elements are indefinitely generated at high speed, these mining methods has an important requirement in that the memory usage required for the performance of the mining process be restricted within an available range. The present invention provides an effective data structure in finding frequent itemsets over data streams and finds necessary information using the data structure. The data structure proposed in the present invention is defined as a compressed prefix tree structure, and the compressed prefix tree merges or splits nodes during the mining operation by comparing the prefix tree structure applied to the conventional data mining to manage a plurality of items in a single node, thus dynamically and flexibly adjusting the tree size. Such dynamic adjustment function dynamically merges and splits nodes in the prefix tree, if the variation of itemsets that are most likely to be frequent itemsets due to the variation of the data stream, thus maximizing the accuracy of the mining result in a restricted memory space, i.e., the accuracy of frequent itemsets found. Moreover, the present invention provides a method for optimizing the memory usage that ensures an optimum mining result within an available memory range using the compressed prefix tree structure.

机译：本发明涉及一种通过分析大量数据集来查找特定信息的方法以及一种在使用该方法实现的数据挖掘系统中查找频繁项集的方法。定义为连续生成的不确定数据集的数据流中的挖掘涉及一种用于从此类数据中有效地找到有价值的知识的方法，最近，人们提出了各种挖掘方法。当考虑其中无限地高速生成数据元素的数据流的特性时，这些挖掘方法具有重要的要求，即将执行挖掘过程所需的内存使用限制在可用范围内。本发明提供了一种有效的数据结构，用于查找数据流上的频繁项集，并使用该数据结构来查找必要的信息。本发明中提出的数据结构被定义为压缩前缀树结构，并且压缩后的前缀树在挖掘操作期间通过比较应用于传统数据挖掘以管理节点中的多个项目的前缀树结构来合并或分割节点。单节点，因此可以动态灵活地调整树的大小。如果由于数据流的变化而导致最有可能成为频繁项目集的项目集发生变化，则这种动态调整功能会动态合并并拆分前缀树中的节点，从而在有限的存储空间中最大化挖掘结果的准确性，即找到的频繁项集的准确性。此外，本发明提供一种用于优化存储器使用的方法，该方法使用压缩前缀树结构来确保在可用存储器范围内的最佳挖掘结果。

著录项

公开/公告号US2007198548A1

专利类型
公开/公告日2007-08-23

原文格式PDF
申请/专利权人 WON SUK LEE;
展开▼

申请/专利号US20060604368
发明设计人 WON SUK LEE;
展开▼

申请日2006-11-27
分类号G06F7/00;
国家 US
入库时间 2022-08-21 21:05:30

相似文献

专利
外文文献
中文文献