K均值算法(KMEANS)是一种应用广泛的经典聚类算法,但其有两个缺陷,即对初始聚类中心敏感及需要人工确定聚类的个数,因而聚类结果的准确率较低.针对K均值聚类算法现存的两个缺陷,为提高算法的精确性与稳定性,以及改善聚类性能,提出了一种改进的K均值算法.该算法通过定义的平均类间最大相似度指标值来确定最佳的K值,将所有数据点中密度较高的点作为备选聚类中心,将备选点中密度最大的两个点作为聚类中心进行初步聚类计算并更新当前聚类中心.当计算得到的平均类间最大相似度现值小于前次计算值,则依据相对距离原则从备选点中动态选择下一个聚类中心;否则,将当前的聚类中心作为最佳初始聚类中心进行K均值聚类计算.实验结果表明,改进后的算法不仅能够有效地提高聚类计算的精确性与稳定性,而且还能缩短聚类计算时间,具有一定的技术优势和应用前景.%KMEANS algorithm is a classical clustering algorithm with popular application.However,there are two defects of it known as sensitivity to initial cluster centers and clustering number needs to determine.Thus,the accuracy of clustering results is rather low.In order to improve its accuracy and stability and ameliorate its clustering performance,an improved K-means clustering algorithm has been presented and acquired.Optimum K value is determined for the improved algorithm by defining average maximum similarity index between classes,and then two points with highest density are selected as cluster centers for initial KMEANS clustering and updating the current cluster center after the ones with higher density have been taken as candidate clustering centers.If the current value of average maximum similarity index between classes is less than the former,then next cluster center is dynamically chosen from candidate cluster centers by principle of reladve distance.Otherwise,the current center is taken as optimum cluster center for KMEANS clustering.Results of experiments show that the improved algorithm can effectively boost clustering accuracy and stability and shorten the clustering time.It also implies both definite technical advantages and perspective for application of the improved algorithm.
展开▼