Decision tree (DT) induction is among the more popular of the data mining techniques. An importantrncomponent of DT induction algorithms is the splitting method, with the most commonly used method beingrnbased on the conditional entropy family. However, it is well known that there is no single splitting method thatrnwill give the best performance for all problem instances. In this paper we explore the relative performancernConditional Entropy family and another family that is based on the Class-Attribute Mutual Information (CAMI)rnmeasure. Our results suggest that while some datasets are insensitive to the choice of splitting methods, otherrndatasets are very sensitive to the choice of splitting methods. For example, some of the CAMI family methodsrnmay be more appropriate than GainRatio (GR) for datasets where all non-class attributes are nominal; somernof the CAMI methods perform as well as GR for datasets where all the non-class attributes are either integerrnor continuous. Given the fact that it is never known beforehand which splitting method will lead to the best DTrnfor the given dataset, and given the relatively good performance of the CAMI methods, it seems appropriaternto suggest that splitting methods from the CAMI family should be included in data mining toolsets.
展开▼