Decision Tree Classification is a simple and important mining function. Decision tree algorithms are computationally intensive, yet do not capture the evolutionary trends from incremental data repository. In conventional mining approaches, if two or more datasets are to be merged to get a single target dataset, the entire computation for constructing a classifier has to be carried out all over again. Previous work in this field has been to construct individual decision tree classifiers and merge them by taking a voted arbitration or by merging the corresponding decision rules. We have attempted a new approach by data-preprocessing the individual windows of the growing database and we call them as Knowledge Concentrates-KC of the respective database windows. ie we propose to extract information from the different windows of the evolving database and store the different windows as Knowledge Concentrates KC. The formation of the KCs is done in the off-line mode. In the mining operations we use the KCs, instead of using the entire past data, thereby reducing the time and space complexity of the mining process. The user dynamically selects the target dataset by identifying the windows of interest. The mining requirement is satisfied by merging the respective KCs and running the decision tree algorithm on the merged KC. The proposed system operates in three phases. The first phase is the planning phase wherein the dataset domain information is gathered and the datamining goals are defined. The second phase makes a single scan on a Window in the database and generates a summary of this window as a Knowledge Concentrate (KC). In our work we have used an efficient dynamic Trie structure to store the KCs. The third phase merges the desired windows (KCs) and applies the Classification algorithm on the aggregate of the KCs to give the final required classifier. The salient issues addressed in the work are to form a condensed form of the database which enables in the extraction of the patterns in the database that are input to a decision making algorithm to form the required decision tree. The entire scheme is decision tree algorithm independent, in the sense that a user has flexibility to use any standard decision tree algorithm.
展开▼