Aiming at the weaknesses of K-raeans algorithm: must determine number of clusters first and is sensitive to the initial centers selection, this paper used the thought of density-based, by setting the parameters Eps-neighborhood and minpts the minimum data number contained in the Eps-neighborhood to exclude the outlier data points and used the unduplicated core objects as the initial cluster centers. This paper also used the quotient of within-cluster distance and between-clusters distance as the criterion function and took the cluster numbers as the optimal clusters numbers while the function reached the minimum value. These improvements had improved the weaknesses of the K-means algorithm. Finally, this paper provided some examples to describe the specific application of the improved algorithm. These examples show that compared to the traditional algorithm , the improved algorithm has a higher accuracy and can get better clusters with data objects similar to one another within the same cluster and dissimilar to the objects in other clusters.%针对传统K-means算法必须事先确定聚类数目以及对初始聚类中心的选取比较敏感的缺陷,采用基于密度的思想,通过设定Eps邻域以及Eps邻域内至少包含的对象数minpts来排除孤立点,并将不重复的核心点作为初始聚类中心;采用类内距离和类间距离的比值作为准则评价函数,将准则函数取得最小值时的聚类数作为最佳聚类数,这些改进有效地克服了K-means算法的不足.最后通过几个实例介绍了改进后算法的具体应用,实例表明改进后的算法比原算法有更高的聚类准确性,更能实现类内紧密类间远离的聚类效果.
展开▼