为了提高传统CURE(clustering using representatives)聚类算法的质量,引入信息熵对其进行改进.该算法使用K-means算法对样本数据集进行预聚类;采用基于信息熵的相似性度量,利用簇中元素提供的信息度量不同簇之间的相互关系,并描述数据的分布;在高、低层聚类阶段,采取不同的选取策略,分别选取相应的代表点.在UCI和人造数据集上的实验结果表明,提出的算法在一定程度上提高了聚类的准确率,且在大型数据集上比传统CURE算法有着更高的聚类效率.%In order to improve the clustering quality of the traditional CURE algorithm, this paper proposed a modified CURE algorithm based on entropy.Firstly, this algorithm adopted K-means algorithm to cluster the sample data sets.Then, it introduced a similarity metric based on entropy to measure the relationship between clusters, this metric gathered information contained in the elements of the data sets, also described the distribution.Finally, in the low and high level of the clustering stage, it employed different strategies on representative points selection.The results of experiments on UCI data sets and synthetic data sets indicate that the proposed algorithm achieves better precision to some extent, and it gets better efficiency than the original CURE algorithm on large data sets.
展开▼