Clustering, the unsupervised classification of objects into groups, is a widely used technique in exploratory data analysis. The clustering problem is a very complex one, and a popular heuristic for solving it is the Simulated Annealing (SA) algorithm. SA is an approximation algorithm that involves generating a neighborhood solution by perturbing the current solution in a small, yet meaningful way. This new solution is accepted with a probability of 1 if it is quantitatively better than the current solution, and accepted according to the Metropolis criterion otherwise. Cluster quality is measured using the Sum of Squared Error (SSE) criterion. This paper presents an SA algorithm that uses a new type of perturbation to generate solutions. Whereas most SA clustering algorithms perturb data point memberships directly, our algorithm perturbs a randomly chosen center using Gaussian mutation, and then reassigns data points in a nearest neighbor fashion. Experimental results on a diverse collection of data sets demonstrate that our algorithm has comparable effectiveness to other SA algorithms, while being much faster due to its simplicity.
展开▼