Clustering is the process of discovering groups within multidimensional data, based on similarities, with a minimal knowledge of their structure. In previous works, we presented an algorithm (partSOM) to cluster distributed datasets, based on self-organizing maps (SOM). This work extends this approach presenting a strategy for efficient cluster analysis in distributed databases using SOM and K-means. The proposed strategy applies SOM algorithm separately in each distributed dataset, relative to database vertical partitions, to obtain a representative subset of each local dataset. In the sequence, these representative subsets are sent to a central site, which performs a fusion of the partial results and applies SOM and K-means algorithms to obtain a final result. Experimental results are compared with traditional SOM and partSOM approaches for different datasets.
展开▼