首页> 中文期刊> 《应用科学学报》 >基于云计算平台Hadoop的HKM聚类算法设计研究

基于云计算平台Hadoop的HKM聚类算法设计研究

         

摘要

为有效解决传统K-means聚类算法在处理大规模数据集时面临的扩展性问题,提出了一种Hadoop K-means聚类算法.该算法首先根据样本密度剔除数据集中孤立点或者噪声点的影响,再利用最大化最小距离思想选取K个初始中心,使初始聚簇中心点最优化,最后用Hadoop云计算平台的MapReduce编程模型实现算法的并行化.实验结果表明,该算法不仅在聚类结果上具有较高的准确率和稳定性,而且能够很好地解决传统聚类算法在处理大规模数据时所面临的扩展性问题.%In order to solve the problem of traditional K-means clustering algorithm in dealing with large-scale data set,a Hadoop K-means (HKM) clustering algorithm is proposed.Firstly,based on the of sample density,the algorithm excludes the effect of data set point or noise.Secondly the optimization of the initial cluster centers is carried out by selecting K initial centers guided by the thought of maximizing the minimum distance.In the end,the MapReduce programming model of Hadoop cloud computing platform is used to realize the parallelization of the algorithm.Experimental results show that the proposed algorithm not only has high accuracy and stability in clustering results,but also can solve the problems of scalability encountered by traditional clustering algorithms in dealing with large scale data.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号