加速大数据聚类K-means算法的改进

韩岩; 李晓

首页> 中文期刊> 《计算机工程与设计》 >加速大数据聚类K-means算法的改进

加速大数据聚类K-means算法的改进

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

为有效处理大规模数据聚类的问题，提出一种先抽样再用最大最小距离进行K-means并行化聚类的方法。基于抽样的方法避免了聚类陷入局部解中，基于最大最小距离法使得初始聚类中心趋于最优化。大量实验结果表明，无论是在单机环境还是集群环境下，该方法受初始聚类中心的影响降低，提高了聚类的准确性，减少了聚类的迭代次数，降低了聚类的时间。%To deal with large-scale data clustering problems,a speeding K-means parallel clustering method was presented which randomly sampled first and then used max-min distance means to carry out K-means parallel clustering.Sampling based method avoids the problem of clustering in local solutions and max-min distance based method makes the initial clustering centers tend to be optimum.Results of a large number of experiments show that the proposed method is affected less by the initial clustering center and improves the precision of clustering in both stand-alone environment and cluster environment.It also reduces the num-ber of iterations of clustering and the clustering time.

著录项

来源
《计算机工程与设计》 |2015年第5期|1317-1320|共4页
作者
韩岩; 李晓;
展开▼
作者单位

中国科学院新疆理化技术研究所;

新疆乌鲁木齐830011;

中国科学院大学计算机与控制学院;

北京100049;

中国科学院新疆理化技术研究所;

新疆乌鲁木齐830011;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
K-均值算法; 随机抽样; 最大最小距离法; 映射归约; 并行化;

相似文献

中文文献
外文文献
专利

1. 针对多聚类中心大数据集的加速K-means聚类算法 [J] . 张顺龙 ,库涛 ,周浩 . 计算机应用研究 . 2016,第002期
2. 大数据下基于改进K-means聚类算法的税收风险识别 [J] . 夏会 ,程平 ,张砾 . 财会月刊（会计版） . 2019,第021期
3. 基于DBSCAN和改进K-means聚类算法的电力负荷聚类研究 [J] . 金之榆 ,王毛毛 ,史会磊 . 东北电力技术 . 2019,第006期
4. 基于改进后的K-means聚类算法的网吧用户行为聚类 [J] . 叶良艳 . 安徽科技学院学报 . 2009,第004期
5. 基于改进自适应遗传算法的K-means聚类算法研究 [J] . 佟昕 . 应用能源技术 . 2018,第001期
6. 基于改进蜂群算法优化聚类数的K-means算法 [C] . QIN Huan ,秦欢 ,YU Zuo-jun . 2016年第27届中国过程控制会议 . 2016
7. 基于改进K-means算法的客户行为聚类识别——以新一站大数据环境为例 [A] . 蔡浩原 . 2017

加速大数据聚类K-means算法的改进

摘要

著录项

相似文献

相关主题

期刊订阅