Clustering is a fundamental issue for big data analysis and data mining.In July 2014, a paper in the Journal of Science proposed a simple yet effective clustering algorithm based on the idea that cluster centers are characterized by a higher density than their neighbors and having a relatively large distance from points with higher densities.The proposed algorithm can detect clusters of arbitrary shapes and differing densities but is very sensitive to tunable parameter dc.In this paper, we propose an improved clustering algorithm that adaptively optimizes parameter dc.The time complexity of our algorithm was super-linear with respect to the size of the dataset.Further, our theoretical analysis and experimental results show the effectiveness and efficiency of our improved algorithm.%聚类是大数据分析与数据挖掘的基础问题.刊登在2014年《Science》杂志上的文章《Clustering by fast search and find of density peaks》提出一种快速搜索密度峰值的聚类算法,算法简单实用,但聚类结果依赖于参数dc的经验选择.论文提出一种改进的搜索密度峰值的聚类算法,引入密度估计熵自适应优化算法参数.对比实验结果表明,改进方法不仅可以较好地解决原算法的参数人为确定的不足,而且具有相对更好的聚类性能.
展开▼