首页> 中文期刊> 《计算机应用研究》 >自适应布谷鸟搜索的并行K-means聚类算法

自适应布谷鸟搜索的并行K-means聚类算法

         

摘要

针对K-means聚类算法受初始类中心影响,聚类结果容易陷入局部最优导致聚类准确率较低的问题,提出了一种基于自适应布谷鸟搜索的K-means聚类改进算法,并利用MapReduce编程模型实现了改进算法的并行化.通过搭建的Hadoop分布式计算平台对不同样本数据集分别进行10次准确性实验和效率实验,结果表明:a)聚类的平均准确率在实验所采用的四种UCI标准数据集上,相比原始K-means聚类算法和基于粒子群优化算法改进的K-means聚类算法都有所提高;b)聚类的平均运行效率在实验所采用的五种大小递增的随机数据集上,当数据量较大时,显著优于原始K-means串行算法,稍好于粒子群优化算法改进的并行K-means聚类算法.可以得出结论,在大数据情景下,应用该算法的聚类效果较好.%The original K-means clustering algorithm is seriously affected by initial centroids of clustering and easy to fall into local optima.So this paper proposed an improved K-means clustering based on adaptive cuckoo search,and achieved the parallelization of the improved algorithm using MapReduce programming model.It implemented accuracy experiments and efficiency experiments 10 times respectively on Hadoop platform for every different data sets,the experimental results show that:a)compared with the original K-means algorithm and PSO-Kmeans,the average accuracy of clustering improved in the experiments which test on four UCI standard data sets;b)tested the average execution efficiency of clustering in the experiments which test on five random incremental data sets,when the amount of data was very large,significantly better than original K-means algorithm,slightly better than PSO-Kmeans.It can be concluded that the algorithm can be applied to large data clustering,and will play a significant effect.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号