首页> 中文期刊>计算机应用 >融合集群度与距离均衡优化的K-均值聚类算法

融合集群度与距离均衡优化的K-均值聚类算法

     

摘要

针对传统K-均值算法对初始聚类中心选择较为敏感的问题,提出了一种基于融合集群度与距离均衡优化选择的K-均值聚类(K-MCD)算法.首先,基于“集群度”思想选取初始簇中心;然后,遵循所有聚类中心距离总和均衡优化的选择策略,获得最终初始簇中心;最后,对文本集进行向量化处理,并根据优化算法重新选取文本簇中心及聚类效果评价标准进行文本聚类分析.对文本数据集从准确性与稳定性两方面进行仿真实验分析,与K-均值算法相比,K-MCD算法在4个文本集上的聚类精确度分别提高了18.6、17.5、24.3与24.6个百分点;在平均进化代数方差方面,K-MCD算法比K-均值算法降低了36.99个百分点.仿真结果表明K-MCD算法能有效提高文本聚类精确度,并具有较好的稳定性.%To deal with the problem that the traditional K-means algorithm is sensitive to the initial clustering center selection,an algorithm of K-Means clustering based on Clustering degree and Distance equalization optimization (K-MCD) was proposed.Firstly,the initial clustering center was selected based on the idea of "cluster degree".Secondly,the selection strategy of total clustering center distance equilibrium optimization was followed to obtain the final initial clustering center.Finally,the text set was vectorized,and the text cluster center and the evaluation criteria of text clustering were reselected to perform text clustering analysis according to the optimization algorithm.The analysis of simulation experiment for the text data set was carried out from the aspects of accuracy and stability.Compared with K-means algorithm,the clustering accuracy of K-MCD algorithm was improved by 18.6,17.5,24.3 and 24.6 percentage points respectively for four text sets;the average evolutionary algebraic variance of K-MCD algorithm was 36.99 percentage points lower than K-means algorithm.The experimental results show that K-MCD algorithm can improve text clustering accuracy with good stability.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号