Automatic classification of electronic text documents is considered; methods are described for constructing classifications: probabilistic, non-numerical, regression, Rocchio's method, neuron nets, an example-based method, the reference-vector method, and the maximum-entropy simulation method; there is a discussion of estimating the performance and throughput of these methods. Results are given on the methods evaluated in the traditions of information retrieval, but the main emphasis is placed on comparing various aspects of the software implementation important to selecting methods for particular job conditions.
展开▼