首页> 中文期刊> 《河南科学》 >K-means聚类算法初始中心选择研究

K-means聚类算法初始中心选择研究

         

摘要

Traditional K-means clustering algorithm clustering initial centers are randomly determined. The actual clustering data set may have isolated points,resulting in a different outcome of each clustering,and the different clustering quality,sometimes caused the local optimization status. To solve these problems,researchers have tried to use the distance method to solve an isolated points and determine the initial cluster centers. This idea exists unscientific , because not only the isolated points are far away from other points around,but also the points are sparse;in addition, when the data volume is too large,with too much data characteristic value,large amount of computation algorithm,it would take a lot of computer resources,the computing speed would be too slow. In this paper,by reseaching the traditional K-means clustering algorithm,the judgments of initial centers and outliers are proposed based on density parameters and initial cluster theory of the distance from the centers,and the traditional K-means clustering algorithm is improved.%传统K-means聚类算法中聚类初始中心点是随机确定的,实际聚类数据集中可能有孤立点,造成了每次聚类的结果不同,聚类质量不同,有时陷入局部优化状态。针对这些问题,研究者曾试图用距离法解决孤立点的判断和确定初始聚类中心。这种思路存在不科学性。因为孤立点不仅指远离其他点,同时它的周围点稀疏;另外,当数据量过大、数据特征值过多时,算法的运算量大,需要占用大量的计算机资源,运算速度过慢。对传统的K-means聚类算法进行研究,提出了基于密度参数和距离理论的初始聚类中心的确定和孤立点的判断,对传统的K-means聚类算法进行改进。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号