The inherent flexibility in both structure and semantics let tree capture most kinds of data, model a wide variety of data sources, and produce an enormous number of information. The ability to extract valuable knowledge from them becomes increasingly important and desirable, however, existing tree mining algorithms suffer from several serious pitfalls in finding frequent patterns from massive tree datasets, because most of them have used a priori property for candidate generation and frequency counting. Some of the major problems are due to (1) modeling data as hierarchical tree structure, (2) computationally high cost of the candidate maintenance, (3) repetitious input dataset scans, and (4) the high memory dependency. Therefore, a more efficient and practical approach for tree data is required. In this paper, we systematically develop the pattern growth method instead of the a priori method, for mining maximal frequent tree patterns which are special frequent patterns of a set of trees. The proposed method not only gets rid of the process for infrequent subtrees pruning, but also totally eliminates the problem of generating candidate subtrees. Hence, it significantly improves the whole mining process.
展开▼