Classification is an important problem in the emerging field of data mining. although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limitng their suitability for data moning larger data sets. This paper discusses issues in builidng a scalable classifier and presents the design of SLIQ~1, a new classifier. SLIQ is a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees. The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining.
展开▼