首页> 外文期刊>Journal of grid computing >Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework
【24h】

Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

机译:Hadoop并行框架下的大数据挖掘改进的K-means聚类算法

获取原文
获取原文并翻译 | 示例
       

摘要

In order to improve the accuracy and efficiency of the clustering mining algorithm, this paper focuses on the clustering mining algorithm for large data. Firstly, the traditional clustering mining algorithm is improved to improve the accuracy, and then the improved clustering algorithm is parallelized to improve the efficiency. In order to improve the accuracy of clustering, an incremental K-means clustering algorithm based on density is proposed on the basis of K-means algorithm. Firstly, the density of data points is calculated, and each basic cluster is composed of the center points whose density is not less than the given threshold and the points within the density range. Then, the basic cluster is merged according to the distance between the two cluster centers. Finally, the points that are not divided into any cluster are divided into the clusters nearest to them. In order to improve the efficiency of the algorithm and reduce the time complexity of the algorithm, the distributed database was used to simulate the shared memory space and parallelize the algorithm on the Hadoop platform of cloud computing. The simulation results show that the clustering accuracy of the proposed algorithm is higher than that of the other two algorithms by more than 10%.
机译:为了提高聚类挖掘算法的准确性和效率,本文侧重于大数据的聚类挖掘算法。首先,改进了传统的聚类挖掘算法以提高准确性,然后改进的聚类算法并行化以提高效率。为了提高聚类的准确性,基于K-Means算法提出了基于密度的基于密度的增量K-Means聚类算法。首先,计算数据点的密度,并且每个基本簇由密度不小于给定阈值的中心点和密度范围内的点。然后,基本群集根据两个集群中心之间的距离合并。最后,不分为任何群集的点被划分为离他们最近的群集。为了提高算法的效率并降低算法的时间复杂性,分布式数据库用于模拟共享内存空间并并行化云计算的Hadoop平台上的算法。仿真结果表明,所提出的算法的聚类精度高于其他两个算法的聚类精度超过10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号