Methods, apparatus, and machine-readable mediums are described for selecting a training set from a larger data set. Samples are divided into a training set and a validation set. Each set meets one or more conditions. For each class to be modeled, multiple training sets are created. Models are trained on each of the multiple training sets. A size of samples for each class is determined based upon the trained models. A training data set that includes a number of samples based upon the determined size of samples is created.
展开▼