Many machine learning areas use subsampling techniques with different objectives: reducing the size of the training set, equilibrate the class imbalance of non-uniform cost error, etc. Subsampling affects severely to the behavior of classification algorithms. Decision trees induced form different subsamples of the same data set are very different in accuracy and structure. The final classifier is a single decision tree, so that it maintains the explaining capacity of the clssification. A comparison in error and structural stability of our algorithm and the C4.5 algorithm is one. The decision trees generated using the new algorithm, achieve smaller error rates and structurally more steady trees than C4.5 when using subsampling techniques.
展开▼