In the paper a very fast algorithm for K-means clustering problem, called Yinyang K-means, is considered. The algorithm uses initial grouping of cluster centroids and the triangle inequality to avoid unnecessary distance calculations. We propose two modifications of Yinyang K-means: regrouping of cluster centroids during the run of the algorithm and replacement of the grouping procedure with a method, which generates the groups of equal sizes. The influence of these two modifications on the efficiency of Yinyang K-means is experimentally evaluated using seven datasets. The results indicate that new grouping procedure reduces runtime of the algorithm. For one of tested datasets it runs up to 2.8 times faster.
展开▼