Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

首页> 外文期刊>Journal of grid computing >Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

【24h】

Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

机译：Hadoop并行框架下的大数据挖掘改进的K-means聚类算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In order to improve the accuracy and efficiency of the clustering mining algorithm, this paper focuses on the clustering mining algorithm for large data. Firstly, the traditional clustering mining algorithm is improved to improve the accuracy, and then the improved clustering algorithm is parallelized to improve the efficiency. In order to improve the accuracy of clustering, an incremental K-means clustering algorithm based on density is proposed on the basis of K-means algorithm. Firstly, the density of data points is calculated, and each basic cluster is composed of the center points whose density is not less than the given threshold and the points within the density range. Then, the basic cluster is merged according to the distance between the two cluster centers. Finally, the points that are not divided into any cluster are divided into the clusters nearest to them. In order to improve the efficiency of the algorithm and reduce the time complexity of the algorithm, the distributed database was used to simulate the shared memory space and parallelize the algorithm on the Hadoop platform of cloud computing. The simulation results show that the clustering accuracy of the proposed algorithm is higher than that of the other two algorithms by more than 10%.

机译：为了提高聚类挖掘算法的准确性和效率，本文侧重于大数据的聚类挖掘算法。首先，改进了传统的聚类挖掘算法以提高准确性，然后改进的聚类算法并行化以提高效率。为了提高聚类的准确性，基于K-Means算法提出了基于密度的基于密度的增量K-Means聚类算法。首先，计算数据点的密度，并且每个基本簇由密度不小于给定阈值的中心点和密度范围内的点。然后，基本群集根据两个集群中心之间的距离合并。最后，不分为任何群集的点被划分为离他们最近的群集。为了提高算法的效率并降低算法的时间复杂性，分布式数据库用于模拟共享内存空间并并行化云计算的Hadoop平台上的算法。仿真结果表明，所提出的算法的聚类精度高于其他两个算法的聚类精度超过10％。

著录项

来源
《Journal of grid computing》 |2020年第2期|共12页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Improved k-means clustering algorithm; Big data mining; Hadoop parallel framework; Shared storage space; Parallel computing; Parallelization; Distributed database;

机译：改进的K-means聚类算法;大数据挖掘;Hadoop并行框架;共享存储空间;并行计算;并行化;分布式数据库;

相似文献

外文文献
中文文献
专利

1. Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework [J] . Journal of grid computing . 2020,第2期

机译：Hadoop并行框架下的大数据挖掘改进的K-means聚类算法
2. Implementation of hadoop optimization K-means parallel clustering algorithm [J] . Huang Suyu, Tan Lingli Basic & clinical pharmacology & toxicology. . 2020,第S9期

机译：Hadoop优化K-mears并行聚类算法的实现
3. Implementation of hadoop optimization K-means parallel clustering algorithm [J] . Huang Suyu, Tan Lingli Basic & clinical pharmacology & toxicology. . 2019,第S1期

机译：Hadoop优化K-mears并行聚类算法的实现
4. Genetic Algorithm Based Parallel K-Means Data Clustering Algorithm Using MapReduce Programming Paradigm on Hadoop Environment (GAPKCA) [C] . Sayer Alshammari, Maslina Binti Zolkepli, Rusli Bin Abdullah International Conference on Soft Computing and Data Mining . 2020

机译：基于遗传算法的并行k均值数据聚类算法使用MapReduce编程范例对Hadoop环境（Gapkca）
5. Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset. [D] . Peterson, Angela R. 2009

机译：可视数据挖掘：使用具有K均值聚类和颜色的平行坐标图来查找多维数据集中的相关性。
6. Big-Data-Mining-Based Improved K-Means Algorithm for Energy Use Analysis of Coal-Fired Power Plant Units: A Case Study [O] . Binghan Liu, Zhongguang Fu, Pengkai Wang, 2018

机译：基于大数据挖掘的改进的K均值燃煤电厂能源分析算法：案例研究
7. Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster [O] . Amresh Kumar, Kiran M, Ravi Prakash G, 2014

机译：Hadoop集群并行K-means算法mapReduce程序模型的验证与验证

Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

摘要

著录项

相似文献

相关主题

期刊订阅