首页> 中文期刊> 《计算机技术与发展》 >基于Hadoop的K-means聚类算法的实现

基于Hadoop的K-means聚类算法的实现

             

摘要

For the problem of high time complexity of K-means algorithm,propose a method using MapReduce programming model and Hadoop cloud platform to reduce the time complexity of K-means algorithm in dealing with huge data.Design Map function to calculate the distance of each record to each center key and mark their categories,and design Reduce function to update the center keys and calculate the distance of each record to its center key,then make a summary of the distance results.Through the experiment,verify that compared with the traditional serial algorithm when dealing with huge data,the new K-means algorithm can indeed reduce the time complexity,and also has good stability and expansibility.%文中针对传统并行K-means聚类算法时间复杂度比较高的问题,结合Hadoop平台以及MapReduce编程模型的优势,提出了利用Hadoop及MapReduce编程模型实现大数据量下的K-means聚类算法.其中,Map函数完成每条记录到各个质心距离的计算并标记其所属类别,Reduce函数完成质心的更新,同时计算每条数据到其所属中心点的距离,并累计求和.通过实验,验证了K-means算法部署在Hadoop集群上并行化运行,在处理大数据时,同传统的串行算法相比,确实能够降低时间复杂度,而且表现出很好的稳定性和扩展性.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号