Software plays a major role in many organizations. Organizational success depends partially on the quality of softwares used. In recent years, many researchers have recognized that statistical classification techniques are well-suited to develop software quality prediction models. Different statistical software quality models, using complexity metrics as early indicators of software quality, have been proposed in the past. At a high-level the problem of software categorization is to classify software modules into fault prone and non-fault prone. Indeed, a learner is given a set of training modules and the corresponding class labels (i.e fault prone or non-fault-prone), and outputs a classifier. Then, the classifier takes an unlabeled module (i.e hitherto-unseen module) and assigns it to a class. The focus of this paper is to study some selected classification techniques widely used for software categorization. Indeed, practitioners are faced with a body of approaches and literature that give several conflicting advices about the usefulness of these classification approaches. The techniques evaluated in this paper include: principal component analysis, linear discriminant analysis, multiple linear regression, logistic regression, support vector machine and finite mixture models. Moreover, we propose a Bayesian approach based on finite Dirichlet mixture models. We evaluate experimentally these approaches using a real data set. Our experimental results show that different algorithms lead to different statistically significant results.
展开▼