首页>
外国专利>
Compressed prefix trees and estDec+ method for finding frequent itemsets over data streams
Compressed prefix trees and estDec+ method for finding frequent itemsets over data streams
展开▼
机译:压缩前缀树和estDec +方法,用于查找数据流中的频繁项集
展开▼
页面导航
摘要
著录项
相似文献
摘要
Disclosed a relates to a method for finding specific information by analyzing a large amount of data set and a method for finding frequent itemsets in a data mining system realized using the same. The mining in a data stream defined as an indefinite set of data continuously generated is directed to a method for finding valuable knowledge effectively from such data and, recently, various mining methods have been proposed. When considering the characteristics of the data stream in which data elements are indefinitely generated at high speed, these mining methods has an important requirement in that the memory usage required for the performance of the mining process be restricted within an available range. The present invention provides an effective data structure in finding frequent itemsets over data streams and finds necessary information using the data structure. The data structure proposed in the present invention is defined as a compressed prefix tree structure, and the compressed prefix tree merges or splits nodes during the mining operation by comparing the prefix tree structure applied to the conventional data mining to manage a plurality of items in a single node, thus dynamically and flexibly adjusting the tree size. Such dynamic adjustment function dynamically merges and splits nodes in the prefix tree, if the variation of itemsets that are most likely to be frequent itemsets due to the variation of the data stream, thus maximizing the accuracy of the mining result in a restricted memory space, i.e., the accuracy of frequent itemsets found. Moreover, the present invention provides a method for optimizing the memory usage that ensures an optimum mining result within an available memory range using the compressed prefix tree structure.
展开▼