K-means聚类算法的性能依赖于距离度量的选择,k-means算法将欧几里德距离作为最常用的距离度量方法.欧氏距离认为所有属性在聚类中作用是相同的,但是这种距离度量方法并不能准确反映样本间的相异性.针对这种不足,提出了融合变异系数的k-means聚类分析方法(CV-k-means),利用变异系数权重向量来减少不相关属性的影响.实验结果表明,该方法的聚类结果优于k-means算法.%The performance of k-means clustering algorithm depends on the selection of distance metrics. The Euclid distance is commonly chosen as the similarity measure in k-means clustering algorithm, which treats all features equally and does not accurately reflect the dissimilarity among samples. K-means clustering algorithm based on Coefficient of Variation (CV-k-means) is proposed in this paper to solve this problem. The CV-k-means clustering algorithm uses variation coefficient weight vector to decrease the affects of irrelevant features. The experimental results show that the proposed algorithm can generate better clustering results than k-means algorithm.
展开▼