In the paper a new method for random initialization of the EM algorithm for multivariate Gaussian mixture models is proposed. In the method booth mean vector and covariance matrix of a mixture component are initialized randomly. The mean vector of the component is initialized by the feature vector, selected from a randomly chosen set of candidate feature vectors, located farthest from already initialized mixture components as measured by theMahalanobis distance. In the experiments the EM algorithm was applied to the clustering problem. Our approach was compared to three well known EM initialization methods. The results of the experiments, performed on synthetic datasets, generated from the Gaussian mixtures with the varying degree of overlap between clusters, indicate that our method outperforms three others.
展开▼