确定数据集的最佳聚类数是聚类研究中的一个重要难题。为了更有效地确定数据集的最佳聚类数,该文提出了通过改进K-means算法并结合一个不依赖于具体算法的有效性指标Q(c)对数据集的最佳聚类数进行确定的方法。理论分析和实验结果证明了该方法具有良好的性能和有效性。%Determining the optimal number of clusters in a dataset is a difficult problem in the relative research field of cluster. To resolve this problem effectively, a method for getting the optimal cluster number in a massive dataset is proposed based on K-means algorithm and cluster quality validity index Q(c) .Theoretical analysis and experimental results have verified the effective-ness and good performance of the algorithm.
展开▼