首页> 中文期刊> 《计算机应用与软件》 >面向大规模数据快速聚类K-means算法的研究

面向大规模数据快速聚类K-means算法的研究

     

摘要

To further enhance the efficiency of K-means clustering algorithm for large-scale data, combined with MapReduce computational model, a parallel clustering method is proposed, which uses Hash function to extract samples and then obtains initial center by Pam algorithm.The sample extracted by Hash function can fully reflect the statistical characteristics of the data, using Pam algorithm to obtain the initial clustering center, and improve the traditional clustering algorithm to rely on the initial center of the problem.It uses the Pam algorithm to obtain the initial clustering center, and improves the problem of that the traditional clustering algorithms rely on the initial center.The experimental results show that the proposed algorithm can effectively improve the clustering quality and efficiency, and is suitable for the clustering analysis of large-scale data.%为进一步提高K-means算法对大规模数据聚类的效率,结合MapReduce计算模型,提出一种先利用Hash函数进行样本抽取,再利用Pam算法获取初始中心的并行聚类方法.通过Hash函数抽取的样本能充分反映数据的统计特性,使用Pam算法获取初始聚类中心,改善了传统聚类算法依赖初始中心的问题.实验结果表明该算法有效提高了聚类质量和执行效率,适用于对大规模数据的聚类分析.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号