首页> 中文期刊>情报学报 >一种考虑数据类大小和密度差异的模糊聚类有效性指标

一种考虑数据类大小和密度差异的模糊聚类有效性指标

     

摘要

聚类有效性指标用于评价聚类质量和确定最佳聚类数,针对包含大小和密度差异性较大数据类的数据集,在分析了传统模糊聚类有效性指标不足的基础上,提出了一个同时考虑紧致性、重叠度和分离性的聚类有效性指标COS.类内紧致性用一定阈值内的隶属度之和与最大类内距离之比表示,一定阈值内各样本同属于两个类的隶属度差异反映了这两个类的重叠度,类间分离性的度量为最小类间距离,使COS指标值最大的聚类数即为最佳聚类数.在四个人工数据集和iris真实数据集上利用模糊C均值算法进行聚类实验的结果表明,COS指标可以有效发现小类和低密度类.%Cluster validity indices are used to validate clustering results and determine the optimal cluster number.rnRegarding to the data set with clusters of different size and density, a new cluster validity index called COS is proposed based on the analysis of drawbacks of traditional cluster validity indices. The compactness, overlapping and separation are taken into account in COS index at the same time. The compactness of intra-clusters is expressed by the ratio of the sum of membership degrees in certain threshold and the max distance of intra-clusters. The difference of membership degrees in certain threshold of a certain point to two clusters indicates the overlapping degree of the two clusters. The measurement of separation of inter-clusters is the minimum distance between clusters. The optimal cluster number is determined by the maximum value of COS index. Experimental studies using fuzzy c-means algorithm on four artificial data sets and iris data set show that the COS index can discover the small size and low density clusters effectively.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号