首页> 中文期刊> 《计算机应用与软件》 >聚类集成中基聚类的优化研究

聚类集成中基聚类的优化研究

         

摘要

Cluster ensemble integrates the multiple partitions of a dataset into a new clustering,which discloses the cluster structure information of all the base clusters to the greatest extent.The qualities of base clusters are obviously crucial to the final ensemble result.K-means is one of the most used algorithms to produce base partitions,as it can be implemented easily and the corresponding computational cost is low,and furthermore,its clustering mechanism conforms to the assumption in machines learning that the class conditional probability of local data is a constant.But K-means usually adopts Gaussian distance as the distance measure,thus it can only find the clusters of spherical shape.It is also unable to generate high-quality base clusters when applied to datasets with complex structures,especially those whose class structures are not distributed spherically but based on connectivity.Therefore,this paper presents an optimization method for base clusters,namely,to judge the homogeneity of the clusters generated by K-means and partition those with poor homogeneity once again to improve the homogeneity.As a result,the quality of the entire cluster ensemble is improved.The experiments on 8 datasets demonstrate the effectiveness of the proposed method.%聚类集成是将一个数据集的多个划分(基聚类)合成一个新的聚类,该聚类最大程度地代表了所有输入基聚类对数据集的聚类信息.显而易见,初始基聚类的质量对于最终的集成划分至关重要.传统的聚类集成中的基聚类器使用最多的是K-means,因为K-means不仅实现简单,计算复杂度不高,而且其聚类机制符合机器学习关于局部数据的类别条件概率为常数的假设.但由于K-means通常直接使用高斯距离作为距离测度,其只能发现球形簇的类;而对于具有结构复杂、尤其是基于连接性且非球形分布的类结构的数据集,不能生成高质量(即同质性高)的基聚类.为此提出一个基聚类的优化方法,即:判定K-means所生成类的同质性,对同质性较差的类进行再次划分,以提高基聚类的同质性,从而提高整个聚类集成的质量.在8个数据集上的实验数据表明所提出的方法是有效的.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号