We propose an extension of an entropy-based heuristic of Quinlan [Q93] for constructing a decision tree from a large database with many numeric attributes. Quinlan pointed out that his original method (as well as other existing methods) may be inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pair of attributes. For R is not an element of R, the data can be split into two classes: data inside R and data outside R. We compute the region R_(opt) is not an element of R that minimizes the entropy of the splitting, and add the splitting associated with R_(opt) (for each pair of strongly correlated attributes) to the set of candidate tests in Quinlan's entropy-based heuristic.
展开▼