基于MapReduce的单遍K-means聚类算法

唐浩; 杨余旺; 辛智斌

首页> 中文期刊> 《计算机技术与发展》 >基于MapReduce的单遍K-means聚类算法

基于MapReduce的单遍K-means聚类算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The application of fitting K-means into MapReduce framework can greatly improve the processing of K-means on large data-sets. But K-means achieves an acceptable clustering effect through multiple iterations. Each iteration is executed as an independent map job,in which the whole dataset must be read and wrote to slow disks,resulting in high I/O overhead,and it is not consistent with the de-sign concept of the MapReduce framework. Therefore,a single-pass K-means clustering algorithm based on MapReduce,called MRSK, is proposed. It reads the data by single-pass and uses the K-means++ seeding algorithm to get the initial cluster center. On the basis of theoretically analyzing the complexity of the MRSK,a series of test and analysis for MRSK is conducted. The experimental results show that compared with the available MapReduce-based and stream-based K-means variants,MRSK performs both faster execution times and higher quality of clustering results.%K-means应用于MapReduce框架的大数据处理可显著提高K-means对大数据集的处理能力.但K-means聚类算法需要进行多次迭代才能达到可接受的效果,并将每次迭代作为一个独立map作业执行,需要读写整个数据集,从而导致显著的I/O消耗,与MapReduce框架的设计理念不符.为此,提出了一个基于MapReduce的单遍K-means算法(MR-SK).该算法采用流数据单遍算法读取数据,聚类时采用K-means++初始化seeding算法得到初始聚类中心.在理论分析MRSK算法复杂度的基础上,进行了MRSK算法的测试验证和相关分析.验证实验结果表明,相对于基于MapReduce和基于数据流的K-means聚类算法,所提出的MRSK算法在执行速度和聚类效果方面具有更好的优势.

著录项

来源
《计算机技术与发展》 |2017年第9期|26-30|共5页
作者
唐浩; 杨余旺; 辛智斌;
展开▼
作者单位

南京理工大学计算机科学与工程学院;

江苏南京 210094;

南京理工大学计算机科学与工程学院;

江苏南京 210094;

淮海集团工业有限公司;

山西长治 046000;

展开▼
原文格式 PDF
正文语种 chi
中图分类算法理论;
关键词
MapReduce框架; 数据聚类; K-means++; Mahout; 单遍技术;

相似文献

中文文献
外文文献
专利

1. 基于MapReduce框架下的K-means聚类算法的改进 [J] . 宋阳 ,石鸿雁 . 计算机与现代化 . 2019,第008期
2. MapReduce框架下基于抽样的分布式K-Means聚类算法 [J] . 杨杰明 ,吴启龙 ,曲朝阳 . 吉林大学学报（理学版） . 2017,第001期
3. 一种基于MapReduce的改进k-means聚类算法研究 [J] . 郭晨晨 ,朱红康 . 河北工业大学学报 . 2016,第005期
4. 基于MapReduce的改进k-means文本聚类算法 [J] . 刘澎 ,陆介平 . 信息技术 . 2016,第011期
5. 基于MapReduce的K-means聚类算法的优化 [J] . 孙玉强 ,李媛媛 ,陆勇 . 计算机测量与控制 . 2016,第007期
6. k-means聚类算法的MapReduce并行化实现 [C] . 李锦文 ,张清辉 ,魏化震 . 第三届中国国家网格学术年会 . 2011
7. 基于MapReduce的k-means聚类算法研究 [A] . 李阳辉 . 2016

基于MapReduce的单遍K-means聚类算法

摘要

著录项

相似文献

相关主题

期刊订阅